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B1MDIMG DOMAINS FROM PLASMODIUM VIVAX AMD 
PLASMODIUM FALCIPARUM ERYTHROCYTE BIND1HG PROTEINS 

BACKGROUND OF THE INVENTION 

Malaria infects 200 * 400 mfflion people each year causing 1-2 million deaths, thus remaining one 

S of the most important infectious diseases in the world. Approximately 25 percent of all deaths of children in rural 

Africa between the ages of one and four years are caused by malaria. Due to the importance of the disease as a 

worldwide health problem, considerable effort is being expended to identify and develop malaria vaccines. 

Malaria in humans is caused by four species of the parasite Plasmodium: P. falciparum, P. max, 

P. knowksimi P. mafariae. The major cause of malaria in humans is P. falciparum which infects 200 million to 

10 400 million people every year, killing 1 to 4 million. 

Duffy Antigen Binding Protein (DABP) and Sialic Acid Binding Protein (SABP) are soluble proteins 

that appear in the culture supernatant after infected erythrocytes release merozoites. Immunochemical data indicate 

that DABP and SABP which are the respective ligands for the P. max and P. falciparum Duffy and sialic acid 

receptors on erythrocytes, possess specificities of binding which are identical either in soluble or membrane bound 

15 form. 

DABP is a 135 kDa protein which binds specifically to Duffy blood group determinants (Werthehner 
etaL Exp. ParasitoL 69: 340-350 (1989); Barnwell eta!., J. Exp. Med. 169: 1795-1802 (1989)). Thus, binding 
of DABP is specific to human Duffy positive erythrocytes. There are four major Duffy phenotypes for human 
erythrocytes: Fy(a), Fy(b), Fy(ab) and Fy(negative), as defined by the anti-Fy 3 and anti-Fy b sera (Hadley et ah, In Red 

20 Cell Antigens and Antibodies, G. Garratty, ed. (Arlington, Va.rAmerican Association of Blood Banks) pp. 1 7-33 (1 986)). 
DABP binds equally to both Fy(a) and Fy(b) erythrocytes which are equally susceptible to invasion by P. mar, but 
not to Fy(negative) erythrocytes. 

In the case of SABP, a 175kDa protein, binding is specific to the glycophorin sialic acid residues 
on erythrocytes (Camus and Hadley, Science 230:553*556 (1985); Orlandi, etaL, J. Cell Biol. 1 16:901-909 11 9 92)). 

25 Thus, neuraminidase treatment (which cleaves off sialic acid residues) render erythrocytes immune to P. falciparum 
invasion. 

The specificities of binding and correlation to invasion by the parasite thus indicate that DABP 
and SABP are the proteins of P. max and P. falciparum which interact with sialic acids and the Duffy antigen on 
the erythrocyte. The genes encoding both proteins have been cloned and .the DNA and predicted protein sequences 

30 have been determined (B. Kim Lee Sim, et aL, J. Ceil Biol 111: 1877-1884 (1990); Fang, X.. et al, Mol. Biocltem 
Parasrtof.44: 125-132 (1991)). 

Despite considerable research efforts worldwide, because of the complexity of the Plasmodium 
parasite and its interaction with its host, it has not been possible to discover a satisfactory solution for prevention 
or abatement of the blood staqe of malaria. Because malaria is a such a targe worldwide hpalth nmhtmn thorn k 

35 a need for methods that abate the impact of this disease. The present invention provides effective preventive and 
therapeutic measures against Plasmodium invasion. 
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SUMMARY OF THE INVENTION 
The present invention provides compositions comprising an isolated DABP binding domain 
polypeptides and/or isolated SABP binding domain polypeptides. The DABP binding domain polypeptides preferably 
comprise between about 200 and about 300 amino acid residues while the SABP binding domain polypeptides 
5 preferably comprises between about 200 and about 600 amino acid residues. A preferred DABP binding domain 
polypeptide has about 325 residues of the amino acid sequence found in SEQ ID N0:2. A preferred SABP binding 
domain polypeptide has about 616 residues of the amino acid sequence of SEQ ID N0:4, encoded by the DNA 
sequence of SEQ ID NO: 3. The preferred DABP binding domain and SABP binding domain include the cysteine-rich 
portions of the proteins shown in Figure 1. 

10 The present invention also includes pharmaceutical compositions comprising a pharmaceutical^ 

acceptable carrier and an isolated DABP binding domain polypeptide in an amount sufficient to induce a protective 
immune response to Plasmodium max merozoites in an organism. In addition, isolated SABP binding domain 
polypeptide in an amount sufficient to induce a protective immune response to Plasmodium falciparum may be added 
to the pharmaceutical composition. 

15 Also provided are pharmaceutical compositions comprising a pharmaceutically acceptable carrier 

and an isolated SABP binding domain polypeptide in an amount sufficient to induce a protective immune response 
to Plasmodium falciparum merozoites in an organism. In addition, isolated DABP binding domain polypeptide in an 
amount sufficient to induce a protective immune response to Plasmodium max may be added to the pharmaceutical 
composition. 

20 Isolated polynucleotides which encode a DABP binding domain polypeptides or SABP binding domain 

polypeptides are also disclosed. In addition, the present invention includes a recombinant cell comprising the 

polynucleotide encoding the DABP binding domain polypeptide. 

The current invention further includes methods of inducing a protective immune response to 

Plasmodium merozoites in a patient The methods comprise administering to the patient an immunologically effective 
25 amount of a pharmaceutical composition comprising a pharmaceutically acceptable carrier and an isolated DABP 

binding domain polypeptide, an SABP binding domain polypeptide or a combination thereof. 

The present disclosure also provides DNA sequences from additional P. falciparum genes in the 

Duffy-binding like {DBD family that have regions conserved with the P. falciparum 175 kD and P. max 135 kD 

binding proteins. 

30 

DEFINITIONS 

As used herein a "DABP binding domain polypeptide" or a "SABP binding domain polypeptide" are 
polypeptides substantially identical (as defined below) to a sequence from the cysteine-rich, ammo-terminal region of 
the Duffy antigen binding protein (DABP) or sialic acid binding protein (SABP), respectively. Such polypeptides are 
35 capable of binding either the Duffy antigen or sialic acid residues on glycophorin. In particular, DABP binding domain 
polypeptides consist of amino acid residues substantially similar to a sequence of SABP within a binding domain 
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containing the cysteine-rich sequence shown in Figure 1. SABP binding domain polypeptides consist of residues 
substantially similar to a sequence of DABP within a binding domain containing the cysteme-rich sequence shown 
in Figure 1. 

The binding domain polypeptides encoded by the genes of the DBL family consist of those residues 
5 substantially identical to the sequence of the binding domains of DABP and SABP as defined above. The DBL family 
comprises sequences with substantial similarity to the conserved regions of the DABP and SABP. These include 
those sequences reported here as ebH (SEQ ID N0:5 and SEQ ID N0:6), E31a (SEQ ID N0:7 and SEQ ID N0:8), var- 
7 (SEQ. ID. NO: 13 and SEQ. ID. N0:14, GenBank Accession No. L42636) and var-t (SEQ. ID. N0:15 and SEQ ID 
NQ:16, GenBank Accession No. L40608). The sequence ebl-2, (SEQ ID N0:9 and SEQ ID NO: 10) represents the 
10 binding domains of var-7, and Proj3 (SEQ ID N0:11 and SEQ ID N0:12) is the binding domain of var- J. The DBL 
family also includes two other members var-2 and var 3 (GenBank Accession No. L40609). 

The polypeptides of the invention can consist of the full length binding domain or a fragment 
thereof. Typically DABP binding domain polypeptides will consist of from about 50 to about 325 residues, preferably 
between about 75 and 300, more preferably between about 100 and about 250 residues. SABP binding domain 
15 polypeptides will consist of from about 50 to about 616 residues, preferably between about 75 and 300, more 
preferably between about 100 and about 250 residues. 

Particularly preferred polypeptides of the invention are those within the binding domain that are 
conserved between SABP and the DBL family. Residues within these conserved domains are shown in Figure 1, 
below. 

20 Two polynucleotides or polypeptides are said to be "identical" if the sequence of nucleotides or 

amino acid residues in the two sequences is the same when aligned for maximum correspondence. Optimal alignment 
of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. 
Math. 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol 48:443 (1970), 
by the search for similarity method of Pearson and Lipman Proc. Natl Acad. Scl fU.SJJ 85: 2444 (1988), by 

25 computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wl), or by inspection. The 
term "substantial identity" means that a polypeptide comprises a sequence that has at least 80% sequence identity, 
preferably 90%, more preferably 95% or more, compared to a reference sequence over a comparison window of 
about 20 residues to about 600 residues- typically about 50 to about 500 residues usually about 250 to 300 

30 residues. The values of percent identity are determined using the programs above. Particularly preferred peptides 
of the present invention comprise a sequence in which at least 70% of the cysteine residues conserved in DABP and 
SABP are present. Additionally, the peptide will comprise a sequence in which at least 50% of the tryptophan 
residues conserved in DABP and SABP are present. The term substantial similarity is also specifically defined here 
with respect to those amino acid residues found to be conserved between UABf, SABf and the sequences ot the 

35 DBL family. These conserved amino acids consist prominently of tryptophan and cysteine residues conserved among 
all sequences reported here. In addition the conserved amino acid residues include phenylalanine residues which may 
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be substituted with tyrosine. These amino acid residues may be determined to be conserved after the sequences 
have been aligned using methods outlined above by someone skilled in the art 

Another indication that polypeptide sequences are substantially identical is if one protein is 
immunologically reactive with antibodies raised against the other protein. Thus, the polypeptides of the invention 
5 include polypeptides immunologically reactive with antibodies raised against the SABP binding domain, the DABP 
binding domain or raised against the conserved regions of the DBL family. 

Another indication that nucleotide sequences are substantially identical is if two molecules hybridize 
to each other under stringent conditions. Stringent conditions are sequence dependent and will be different in 
different circumstances. Generally, stringent conditions are selected to be about 5° C lower than the thermal melting 
10 point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined 
ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Typically, 
stringent conditions will be those in which the salt concentration is about 0.02 molar at pH 7 and the temperature 
is at least about 60°C. 

Nucleotide sequences are also substantially identical for purposes of this application when the 
15 polypeptides which they encode are substantially identical. Thus, where one nucleic acid sequence encodes 
essentially the same polypeptide as a second nucleic acid sequence, the two nucleic acid sequences are substantially 
identical, even if they would not hybridize under stringent conditions due to silent substitutions permitted by the 
genetic code (see, Darnell et at. (1990) Molecular Cell Bfalogy, Second Edition Scientific American Books, W.H. 
Freeman and Company, New York, NY, for an explanation of codon degeneracy and the genetic code). 
20 The phrases "isolated" or "biologically pure" refer to material which is substantially or essentially 

free from components which normally accompany it as found in its native state. Thus, the binding domain 
polypeptides of this invention do not contain materials normally associated with their in situ environment, e.g., other 
proteins from a merozoite membrane. Typically, isolated proteins of the invention are at least about 80% pure, 
usually at least about 90%, and preferrably at least about 95% as measured by band intensity on a silver stained 
25 gel 

Protein purity or homogeneity may be indicated by a number of means well known in the art, such 
as polyacrylamide gel electrophoresis of a protein sample, followed by visualization upon staining. For certain 
purposes high resolution will be needed and HPLC or a similar means for purification utilized. 

The term "residue" refers to an amino acid (D or L) or amino acid mimetic incorporated in a 
30 oligopeptide by an amide bond or amide bond mimetic. An amide bond mimetic of the invention includes peptide 
backbone modifications well known to those skilled in the art. 

BRIEF DESCRIPTION OF THE DRAWINGS 
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Figure 1 represents an alignment of the predicted amino acid sequences of the DABP binding 
domain (Vivax) (SEQ ID N0:25), the two homologous SABP domains (SABP F1 (SEQ ID NCL26) and SABP F2 (SEQ 
ED N0:27)) and the sequenced members of the Aft gene family (ebM (SEQ ID NQ;28), E31a (SEQ ID N0:29), EBL-2 
(SEQ ID N0:30)) and the three homologous Proj3 domains (F1 (SEQ ID NO: 31), F2 (SEQ ID N0:32) and F3 (SEQ ID 
5 N0:33». 

Figure 2 represents a schematic of the pRE4 cloning vector. 

Figure 3 shows primers useful for isolating sequences encoding the conserved motifs of the 
invention. Primers UNIEBP5 (SEQ ID N0:35) and UNIEBP5A (SEQ ID N0:36) encode the amino acid sequence of SEQ 
ID NQ:34; primers UNIEBP5B (SEQ ID N0:38) and UNIEBP5C (SEQ ID N0:39) encode the amino acid sequence of 

10 SEQ ID N0:37; primers UNIEBP3 (SEQ ID N0:41) and UNIEBP3A (SEQ ID N0:42) encode the ammo acid sequence 
of SEQ ID N0:40; and primers UNIEBP3B (SEQ ID N0:44) and UNIEBP3C (SEQ ID N0:45) encode the amino acid 
sequence of SEQ ID N0:43. 

Figure 4 shows the relative position of the E31a ORF on chromosome 7. 

Figure 5 shows a map of a var gene cluster on chromosome 7. Relative positions of four YACs 

1 5 (Pf YEF2. PfYFE6, Pf YKF8, Pf YED9) are indicated under the chromosome 7 fine at the top of the figure. YACs Pf YFE6 
and Pf YKF8 lie entirely within a segment linked to COR in a genetic cross, whereas YACs PfYED9 and PfYEF2 extend 
beyond sites (identified by pE53a and pH270.5) that are dissociated from the chloroquine response. The var cluster 
extends over a region of 100-150 kb in PfYED9. Exons and introns of the var-J, var 2 and var 3 genes within the 
sequenced 40 kb segment are represented by solid and dotted lines, respectively; arrows show the coding direction. 

20 Two more var elements outside of the sequenced region, identified by conserved restriction sites and cross- 
hybridization, are indicated by dashed-lines (var-2c and var-3& Bold letters mark repeated restriction sites that 
suggest a duplication in the var-2/var-3 and var-2c/var-3c segments. Enzyme recognition sites: A, ApaY, B, Bgh\ C, 
f/dt D, ////rdlll; E, HaeWY, H, BssHW; K, Kpn\; M. BamWY, P, HpaY, S, Smal //Mil and Hae\\\ sites outside of the 
sequenced region were not mapped. Positions and sizes of inserts from the Dd2 subsegment library are indicated: 

25 a, pE280b; b, pB20.3; c, pBBOO; d, pE21b; e, pB20J24; f, pE32b; h, pE241a; i. pE240af51d; j, pE33a; k, pB20.23; 
1, /IL17BA6; m, pB20.26; n, pB20SU.27; o, p15J2J3. Inserts from the PfYED9 34 kb ApaVSma\ fragment library: 
r, pB3; s, p3G11; t, pJVs; u, p2E10; v, p!G3; w f p2E3; x, p2B6; y, PE10; z, pJYr; a, pC5; fi, pi A3; y, p1F6; 6. 
p3C3; e, pA2; £ p2A9; /?, p3C4; 6, pJZn; k, p3D8. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

30 . The binding of merozoites and schizonts to erythrocytes is mediated by specific binding proteins 

on the surface of the merozoite or schizont and is necessary for erythrocyte invasion. In the case of P. falciparum, 
this binding involves specific interaction between sialic acid glycophorin residues on the erythrocyte and the sialic 
acid binding protein (SABP) on the surface of the merozoite or schizont. The ability of purified SABP to bind 
. erythrocytes with chemically or enzymatically altered sialic acid residues paralleled the ability of P. falciparum to 

35 invade these erythrocytes. Furthermore, sialic acid deficient erythrocytes neither bind SABP nor support invasion 
by P. falciparum. The DNA encoding SABP from P. falciparum has also been cloned and sequenced. 
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In P. max, specific binding to the erythrocytes involves interaction between the Duffy blood group 
antigen on the erythrocyte and the Duffy antigen binding protein (DABP) on the merozoite. Duffy binding proteins 
were defined biologically as those soluble proteins that appear in the culture supernatant after the infected 
erythrocytes release merozoites which bind to human Duffy positive, but not to human Duffy negative erythrocytes. 
5 It has been shown that binding of the P. max DABP protein to Duffy positive erythrocytes is blocked by antisera 
to the Duffy blood group determinants. Purified Duffy blood group antigens also block the binding to erythrocytes. 
DABP has also been shown to bind Duffy blood group determinants on Western blots. 

Duffy positive blood group determinants on human erythrocytes are essential for invasion of human 
erythrocytes by Plasmodium max. Both attachment and reorientation of P. max merozoites occur equally well on 

10 Duffy positive and negative erythrocytes. A junction then forms between the apical end of the merozoite and the 
Duffy-positive erythrocyte, followed by vacuole formation and entry of the merozoite into the vacuole. Junction 
formation and merozoite entry into the erythrocyte do not occur on Duffy negative cells, suggesting that the receptor 
specific for the Duffy determinant is involved in apical junction formation but not initial attachment. The DNA 
sequences encoding the DABP from P. max and P. knowfesihme bran cloned and sequenced. 

15 P. vhfax red cell invasion has an absolute requirement for the Duffy blood group antigen. Isolates 

of P. falciparum, however, vary in their dependency on sialic acid for invasion. Certain P. falciparum clones have 
been developed which invade sialic acid deficient erythrocytes at normal rates. This suggests that certain strains 
of P. falciparum can interact with other ligands on the erythrocyte and so may possess multiple erythrocyte binding 
proteins with differing specificities. 

20 A basis for the present invention is the discovery of the binding domains in both DABP and SABP. 

Comparison of the predicted protein sequences of DABP and SABP reveals an amino-terminal, cysteine-rich region 
in both proteins with a high degree of similarity between the two proteins. The amino-terminal, cysteine-rich region 
of DABP contains about 325 amino acids, whereas the amino-terminal cysteine-rich region of SABP contains about 
616 amino acids. This is due to an apparent duplication of the amino-terminal, cysteine-rich region in the SABP 

25 protein. The cysteine residues are conserved between the two regions of SABP and DABP, as are the amino acids 
surrounding the cysteine residues and a number of aromatic amino acid residues in this region. The amino-terminal 
cysteine rich region and another cysteine-rich region near the carboxyl-terminus show the most similarity between 
the DABP and SABP proteins. The region of the amino acid sequence between these two cysteine-rich regions show 
only limited similarity between DABP and SABP. 

30 Other P. falciparum open reading frames and genes with regions that have substantial identity to 

binding domains of SABP and DABP have been identified. Multiple copies of these sequences exist in the parasite 
genome, indicating their important activity in host-parasite interactions. A family of these sequences (the DBL family) 
have been cloned from chromosome 7 subsegment libraries that were constructed during genetic studies of the 
chloroquine resistance locus (Wellems at. aL, PlilAS 88: 3382-3386 (1991)). Certain of these transcripts are known 

35 to be from the var family of genes that modulate cytoadherence and antigenic variation of P. falciparum- infected 
erythrocytes {see, Example 3, below). 
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Genes of the P. falciparum var family encode 200-350 kD variant surface molecules that determine 
antigenic and adhesive properties of parasitized erythrocytes. The large repertoire of var genes (50-150 copies, 
having sufficient DNA to account for 2-6% of the haploid genome), the dramatic sequence variation among the gene 
copies, their variable expression in different parasite fines, the ready detection of DNA rearrangements, and the 
5 receptor binding features of the encoded extracellular domains all implicate var genes as the major determinants of 
antigenic variation and cytoadherence in P. falciparum malaria. 

A second class of ZW?Z -encoding transcripts includes single-copy genes such as ebH. Genetic 
linkage studies have placed this gene within a region of chromosome 13 that affects invasion of malarial parasites 
in human red blood cells (Wellems et ah, Ceff 49:633-642 (1987)). Both SABP and ebM show restriction patterns 
10 that are well conserved among different parasite isolates. This conservation of gene structure and the sequence 
relationships between the ebhl and SABP domains suggest that ebf-f encodes a novel erythrocyte binding molecule 
having receptor properties distinct from those of SABP. 

Southern hybridization experiments using probes from these open reading frames have indicated 
that additional copies of these conserved sequences are located elsewhere in the genome. The largest of the open 
15 reading frames on chromosome 7 is 8 kilobases and contains four tandem repeats homologous to the N-terminal, 
cysteine-rich unit of SABP and DABP. 

Figure 1 represents an alignment of the DBL family with the DABP binding domain and two 
homologous regions of SABP (F 1 and F 2 ). The DBL family is divided into two sub families to achieve optimal 
alignment. Conserved cysteine residues are shown in bold face and conserved aromatic residues are underlined. 
20 The polypeptides of the invention can be used to raise monoclonal antibodies specific for the 

binding domains of SABP, DABP or the conserved regions in the DBL gene family. The antibodies can be used for 
diagnosis of malarial infection or as therapeutic agents to inhibit binding of merozortes to erythrocytes. The 
production of monoclonal antibodies against a desired antigen is well known to those of skill in the art and is not 
reviewed in detail here. 

25 The multitude of techniques available to those skilled in the art for production and manipulation 

of various immunoglobulin molecules can thus be readily applied to inhibit binding. As used herein, the terms 
"immunoglobulin" and "antibody" refer to a protein consisting of one or more polypeptides substantially encoded by 
immunoglobulin genes. Immunoglobulins may exist in a variety of forms besides antibodies, including for example, 
Fv, Fab, and Flab^, as well as in single chains. For a general review of immunoglobulin structure and function see, 

30 Fundamental Immunology, 2d Ed., W.L Paul ed., Ravens Press, N.Y., (1989). 

Antibodies which bind polypeptides of the invention may be produced by a variety of means. The 
production of non-human monoclonal antibodies, e.g., murine, lagomorpha, equine, etc., is wed known and may be 
accomplished by, for example, immunizing the animal with a preparation containing the polypeptide. 
Antibody-producing cells obtained from the immunized animals are immortalized and screened, or screened first for 

35 the production of antibody which inhibits binding between and meroxoites and erythrocytes and then immortalized. 
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For a discussion of general procedures of monoclonal antibody production see Harlow and Lane, Antibodies, A 
Laboratory Manual Spring Harbor Publications, N.Y. (1988). 

Thus, the present invention allows targeting of protective immune responses or monoclonal 
antibodies to sequences in the binding domains that are conserved between SABP, DABP and encoded regions of the 
5 DBL family. Identification of the binding regions of these proteins facilitates vaccine development because it allows 
for a focus of effort upon the functional elements of the large molecules. The particular sequences within the 
binding regions refine the target to critical regions that have been conserved during evolution, and are thus preferred 
for use as vaccines against the parasite. 

The genes of the DBL family (which have not previously been sequenced) can be used as markers 

10 to detect the presence of the P. falciparum parasite in patients. This can be accomplished by means well known 
to practitioners in the art using tissue or blood from symptomatic patients in PCR reactions with oligonucleotides 
complementary to portions of the genes of the DBL family. Furthermore, sequencing the DBL family provides a 
means for skilled practitioners to generate defined probes to be used as genetic markers in a variety of applications. 

Additionally, the present invention defines a conserved motif present in, but not restricted to other 

15 members of the subphylum Apicomplexa which participates in host parasite interaction. This motif can be identified 
in Plasmodium species and other parasitic protozoa by the polymerase chain reaction using the synthetic 
oligonucleotide primers shown in Figure 3. PCR methods are described in detail below. These primers are designed 
from regions in the conserved motif showing the highest degree of conservation among DABP, SABP and the DBL 
family. Figure 3 shows these regions and the consensus amino acid sequences derived from them. 

20 A. General Methods 

Much of the nomenclature and general laboratory procedures required in this application can be 
found in Sambrook, etaL, Molecular Cloning A Laboratory Manual, 2nd Ed., VoL 1-3, Cold Spring Harbor Laboratory, 
Cold Spring Harbor, NY, 1989. The manual is hereinafter referred to as "Sambrook, at aL, 1989." 

The practice of this invention involves the construction of recombinant nucleic acids and the 

25 expression of genes in transf ected cells. Molecular cloning techniques to achieve these ends are known in the art. 
A wide variety of cloning and in vitro amplification methods suitable for the construction of recombinant nucleic acids 
are well-known to persons of skilL Examples of these techniques and instructions sufficient to direct persons of skill 
through many cloning exercises are found in Berger and Khnmel, Guide to Molecular Cloning Techniques, Methods 
in Enzymobgy volume 152 Academic Press, Inc., San Diego, CA (Berger); and Current Protocols in Molecular Biology, 

30 F.M. Ausubel et al, eds.. Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John 
Wiley & Sons, Inc., (1994 Supplement) (Ausubel). 

Examples of techniques sufficient to direct persons of skill through in vitro amplification methods, 
including the polymerase chain reaction (PCR) the Bgase chain reaction (LCR), Q/?-replicase amplification and other 
RNA polymerase mediated techniques are found in Berger, Sambrook et aL, 1989, and Ausubel, as well as Mullis 

35 et aL, (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et aL eds), 
Academic Press Inc., San Diego, CA, 1990) ("Innis"); Arnheim & Levinson (October 1, 1990) C&EN 3647; The 
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Journal Of N1H Research (1991) 3, 81-94; Kwoh et ah (1989) Proc. NatL Acad. Set USA 86, 1173; Guatelli et ah 

(1990) Proc. NatL Acad. ScL USA 87, 1874; Lomell et at. (1989) J. Clin. Chem 35, 1826; Landegren et aL, (1988) 
Science 241, 1077-1080; Van Brunt (1990) Biotechnology 291-294; Wu and Wallace, (1989) Gene 4, 560; and 
Barringer et aL (1990) Gene 89, 117. Improved methods of cloning in vitro amplified nucleic acids are described 

5 in Wallace et aL, U.S. Pat No. 5,426,039. 

The culture of cells used in the present invention, including cell lines and cultured cells from tissue 
or blood samples is well known in the art Freshney [Culture of Animal CeBs, a Manual of Basic Technique, third 
ed., Wiley-Liss, New York, NY (1994)) and the references cited therein provides a general guide to the culture of 
cells. 

10 DBL genes are optionally bound by antibodies in one of the embodiments of the present invention. 

Methods of producing polyclonal and monoclonal antibodies are known to those of skill in the art. See, e.g., Coligan 

(1991) Current Protocols in Immunology Wiley/Greene, NY; and Harlow and Lane (1989) Antibodies: A Laboratory 
Manual Cold Spring Harbor Press, NY; Stttes et aL (eds.) Bask and Clinical Immunology (4th ed.) Lange Medical 
Publications, Los Altos, CA, and references cited therein; Godmg (1986) Monoclonal Antibodies: Principles and 

15 Practice (2d ed.) Academic Press, New York, NY; and Kohler and Milstein (1975) Nature 256: 495497. Other 
suitable techniques for antibody preparation include selection of libraries of recombinant antibodies in phage or similar 
vectors. See, Huse et aL (1989) Science 246: 1275-1281; and Ward, et aL (1989) Nature 341: 544-546. Specific 
Monoclonal and polyclonal antibodies will usually bind with a KD of at least about .1 mM, more usually at least 
about 1 /yM, and most preferably at least about .1 //M or better. 

20 B. Methods for isolating DNA encoding SABP, DABP and DBL binding regions 

The nucleic acid compositions of this invention, whether RNA, cDNA, genomic DNA, or a hybrid 
of the various combinations, may be isolated from natural sources or may be synthesized in vitro. The nucleic acids 
claimed may be present in transformed or transf ected whole cells, in a transformed or transfected cell lysate, or in 
a partially purified or substantially pure form. 

25 Techniques for nucleic acid manipulation of genes encoding the binding domains of the invention, 

such as subcloning nucleic acid sequences encoding polypeptides into expression vectors, labelling probes, DNA 
hybridization, and the like are described generally in Sambrook et aL, 1989. 

Recombinant DNA techniques can be used to produce the binding domain polypeptides. In general, 
the DNA encoding the SABP and DABP binding domains are first cloned or isolated in a form suitable for ligation 

30 into an expression vector. After ligation, the vectors containing the DNA fragments or inserts are introduced into 
a suitable host cell for expression of the recombinant binding domains. The polypeptides are then isolated from the 
host cells. 

There are various methods of isolating the DNA sequences encoding the SABP, DABP and DBL 
binding domains. Typically, the DNA is isolated from a genomic or cDNA library using labelled oligonucleotide probes 
35 specific for sequences in the DNA. , Restriction endonuclease digestion of genomic DNA or cDNA containing the 
appropriate genes can be used to isolate the DNA encoding the binding domains of these proteins. Since the DNA 
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sequences of the SABP and DABP genes are known, a panel of restriction endonucteases can be constructed to give 
cleavage of the DNA in the desired regions. After restriction endonuclease digestion, DNA encoding SABP binding 
domain or DABP binding domain is identified by its ability to hybridize with nucleic acid probes, for example on 
Southern blots, and these DNA regions are isolated by standard methods familiar to those of skill in the art. See 
5 Sambrook, af a/., 1989. 

The polymerase chain reaction can also be used to prepare DABP, SABP DBL binding domain DNA. 
Polymerase chain reaction technology (PCR) is used to amplify nucleic acid sequences of the DABP and SABP binding 
domains directly from mRNA, from cDNA, and from genomic libraries or cDNA libraries. The primers shown in Figure 
3 are particularly preferred for this process. 
10 Appropriate primers and probes for amplifying the SABP and DABP binding region DNA's are 

generated from analysis of the DNA sequences. In brief, oligonucleotide primers complementary to the two 3' borders 
of the DNA region to be amplified are synthesized. The polymerase chain reaction is then carried out using the two 
primers. See PCR Protocols: A Guide to Methods and Applications. (Innis, M, Gelfand, D., Snmsky, J. and White, 
T., (eds.), Academic Press, San Diego, CA (1990). Primers can be selected to amplify the entire DABP regions or 
15 to amplify smaller segments of the DABP and SABP binding domains, as desired. 

Oligonucleotides for use as probes are chemically synthesized according to the solid phase 
phosphoramidite triester method first described by Beaucage, S.L and Caruthers, M.H., 1981, Tetrahedron Letts., 
22(20):1859-1862 using an automated synthesizer, as described in Needham-VanDevanter, D.R., etal 1984, Nucleic 
Acids Res., 12:6159-6168. Purification of oligonucleotides is by either native acrylamide gel electrophoresis or by 
20 anion-exchange HPLC as described in Pearson, J.D. and Regnier, F.L, 1983, J. Chrom., 255:137-149. 

The sequence of the synthetic oligonucleotides can be verified using the chemical degradation 
method of Maxam, A.M. and Gilbert, 1980, in W., Grossman, L and Moldave, D., eds. Academic Press, New York, 
NY, Methods in Enzymology 65:499-560. 

Other methods known to those of skfll in the art may also be used to isolate DNA encoding all 
25 or part of the SABP or DABP binding domains. See Sambrook, et al., 1989. 

C. Expression of DABP. SABP and DBL Binding Domain Polypeptides 
Once binding domain DNAs are isolated and cloned, one may express the desired polypeptides in 
a recombinantly engineered cell such as bacteria, yeast, insect (especially employing baculoviral vectors), and 
mammalian cells. It is expected that those of skill in the art are knowledgeable in the numerous expression systems 
30 available for expression of the DNA encoding the DABP and SABP binding domains. No attempt to describe in detail 
the various methods known for the expression of proteins in prokaryotes or eukaryotes will be made. 

in brief summary, the expression of natural or synthetic nucleic acids encoding binding domains 
will typically be achieved by operably linking the DNA or cDNA to a promoter (which is either constitutive or 
inducible), followed by incorporation into an expression vector. The vectors can be suitable for replication and 
35 integration in either prokaryotes or eukaryotes. Typical expression vectors contain transcription and translation 
terminators, initiation sequences, and promoters useful for regulation of the expression of the DNA encoding the 
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binding domains. To obtain high level expression of a cloned gene, it is desirable to construct expression plasmids 
which contain, at the minonum, a strong promoter to direct transcription, a ribosome binding site for translational 
initiation, and a transcription/translation terminator. 

1. Expression in Prokarvotes 

5 Examples of regulatory regions suitable for this purpose in £ coll are the promoter and operator 

region of the £ coff tryptophan biosynthetic pathway as described by Yanofsky, C, 1984, J. BacterioL, 
158:1018-1024 and the leftward promoter of phage lambda (P L ) as described by Herskowitz, L and Hagen, D., 1980, 
Ann. Rev. Genet., 14:399-445. The inclusion of selection markers in DNA vectors transformed in £ coff is also 
useful. Examples of such markers include genes specifying resistance to ampiciliin, tetracycline, or chloramphenicol. 
10 See Sambrook et al, 1989, for details concerning selection markers for use in £ coff. 

The vector is selected to allow introduction into the appropriate host cell Bacterial vectors are 
typically of plasmid or phage origin. Appropriate bacterial cells are infected with phage vector particles or 
transfected with naked phage vector DNA. If a plasmid vector is used, the bacterial cells are transfected with the 
plasmid vector DNA. 

15 Expression systems for expressing the DABP and SABP binding domains are available using £ cofi, 

Bacillus sp. (Palva, I et a/., 1983, Gene 22:229-235; Mosbach, K. et al. Nature, 302:543-545 and Salmonella. £ 

coil systems are preferred. 

The binding domain polypeptides produced by prokaryote cells may not necessarily fold property. 

During purification from £ coff t the expressed polypeptides may first be denatured and then renatured. This can be 
20 accomplished by solubilizing the bacterially produced proteins in a chaotropic agent such as guanidine HCI and 

reducing all the cysteine residues with a reducing agent such as beta-mercaptoethanol. The polypeptides are then 

renatured, either by slow dialysis or by gel filtration. U.S. Patent No. 4,511,503. 

Oetection of the expressed antigen is achieved by methods known in the art as radioimmunoassays. 

Western blotting techniques or immunoprecipitation. Purification from £ coff can be achieved following procedures 
25 described in U.S. Patent No. 4,51 1,503. 

2. Synthesis of SABP. DABP and DBl Binding Domains in Eukarvotes 

A variety of eukaryotic expression systems such as yeast, insect cell lines and mammalian cells, 
are known to those of skill in the art. As explained briefly below, the DABP and SABP binding domains may also 
be expressed in these eukaryotic systems. 
30 a. Expression m Yeast 

Synthesis of heterologous proteins in yeast is well known and described. Methods m Yeast 
Genetics, Sherman, F., etal, Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the various 
methods available to produce the binding domains in yeast. 

Examples of promoters for use in yeast include GAL1,10 (Johnson, M., and Davies, R.W., 1984, 
35 Mol. and Cell. BioL, 4:1440-1448) ADH2 (Russell, D., et al 1983, J. Biol. Chem., 258:2674-2682), PH05 (EMBO 
J. 6:675-680, 1982), and MFaf (Herskowitz, I. and Oshima, Y. f 1982, in The Molecular Biology of the Yeast 
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Saccharomyces, (eds. Strathern. J.N. Jones, E.W., and Broach, J.R., Cold Spring Harbor Lab., Cold Spring Harbor, 
N.Y., pp. 181-209. A multicopy plasmid with a selective marker such as Leu-2, URA-3, Trp-1, and His-3 is also 
desirable. 

A number of yeast expression plasmids like YEp6, YEp13, YEp4 can be used as vectors. A gene 
5 of interest can be fused to any of the promoters in various yeast vectors. The above-mentioned plasmids have been 
fully described in the literature (Botstein, etaL, 1979, Gene, 8:17-24; Broach, etal., 1979, Gene, 8:121-133). 

Two procedures are used in transforming yeast cells. In one case, yeast cells are first converted 
into protoplasts using zymolyase, lyticase or glusulase, followed by addition of DNA and polyethylene glycol (PEG). 
The PEG-treated protoplasts are then regenerated in a 3% agar medium under selective conditions. Details of this 
10 procedure are given in the papers by J.D. Beggs, 1978, Nature (London), 275:104-109; and Hinnen, A., etaL, 1978, 
Proc. Natl. Acad. Sci. USA, 75:1929 1933. The second procedure does not involve removal of the cell wall. Instead 
the cells are treated with lithium chloride or acetate and PEG and put on selective plates (Ito, H., et aL, 1983, J. 
Bact., 153:163-168). 

The binding domains can be isolated from yeast by lysing the cells and applying standard protein 

15 isolation techniques to the ly sates. The monitoring of the purification process can be accomplished by using Western 
blot techniques or radioimmunoassays of other standard immunoassay techniques. 

b. Expression in Mammalian and Insect Cell Cultures 
Illustrative of cell cultures useful for the production of the binding domains are cells of insect or 
mammalian origin. Mammalian cell systems often will be in the form of monolayers of cells although mammalian cell 

20 suspensions may also be used. Illustrative examples of mammalian cell lines include VERO and HeLa cells, Chinese 
hamster ovary (CHO) cell lines, W138, BHK, Cos-7 or MDCK cell fines. 

As indicated above, the vector, e. g., a plasmid, which is used to transform the host cell, 
preferably contains DNA sequences to initiate transcription and sequences to control the translation of the antigen 
gene sequence. These sequences are referred to as expression control sequences. When the host cell is of insect 

25 or mammalian origin illustrative expression control sequences are obtained from the SV-40 promoter (Science, 
222:524-527, 1983), the CMV I.E. Promoter (Proc. NatL Acad. Sci. 81:659-663, 1984) or the metallothionein 
promoter (Nature 296:3942, 1982). The cloning vector containing the expression control sequences is cleaved using 
restriction enzymes and adjusted in size as necessary or desirable and ligated with DNA coding for the SABP or 
DABP polypeptides by means well known in the art. 

30 * As with yeast, when higher animal host cells are employed, polyadenlyation or transcription 

terminator sequences from known mammalian genes need to be incorporated into the vector. An example of a 
terminator sequence is the polyadenlyation sequence from the bovine growth hormone gene. Sequences for accurate 
splicing of the transcript may also be included. An example of a splicing sequence is the VPI intron from SV40 
(Sprague, J. at aL, 1983, J. Virol 45: 773-781). 

35 Additionally, gene sequences to control replication in the host cell may be incorporated into the 

vector such as those found in bovine papilloma virus type-vectors. Saveria-Campo, M., 1985, "Bovine Papilloma virus 
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DNA a Eukaryotic Cloning Vector" in DNA Cloning Vol. II a Practical Approach Ed. D.M. Glover, IRL Press, Arlington, 
Virginia pp. 213-238. 

The host cells are competent or rendered competent for transformation by various means. There 
are several well-known methods of introducing DNA into animal cells. These include: calcium phosphate precipitation, 
5 fusion of the recipient cells with bacterial protoplasts containing the DNA, treatment of the recipient cells with 
liposomes containing the DNA, DEAE dextran, electroporation and micro-injection of the DNA directly into the cells. 

The transformed cells are cultured by means well known'in the art Biochemical Methods in Cell 
Culture and Virology. Kuchier, RJ., Dowden, Hutchinson and Ross, Inc., (1977). The expressed DABP and SABP 
binding domain polypeptides are isolated from cells grown as suspensions or as monolayers. The latter are recovered 
10 by well known mechanical, chemical or enzymatic means. 

c. Expression in recombinant vaccinia virus- or adenovirus-infected cells 
In addition to use in recombinant expression systems, the isolated binding domain DNA sequences 
can also be used to transform viruses that transfect host cells in the patient. Live attenuated viruses, such as 
vaccinia or adenovirus, are convenient alternatives for vaccines because they are inexpensive to produce and are 
15 easily transported and administered. Vaccinia vectors and methods useful in immunization protocols are described, 
for example, in U.S. Patent No. 4,722,848. 

Suitable viruses for use in the present invention include, but are not limited to, pox viruses, such 
as canarypox and cowpox viruses, and vaccinia viruses, alpha viruses, adenoviruses, and other animal viruses. The 
recombinant viruses can be produced by methods well known in the art, for example, using homologous recombination 
20 or tigatmg two plasmids. A recombinant canarypox or cowpox virus can be made, for example, by inserting the 
DNA's encoding the DABP and SABP binding domain polypeptides into plasmids so that they are flanked by viral 
sequences on both sides. The DNA's encoding the binding domains are then inserted into the virus genome through 
homologous recombination. 

A recombinant adenovirus can be produced, for example, by ligating together two plasmids each 
25 containing about 50% of the viral sequence and the DNA sequence encoding erythrocyte binding domain polypeptide. 
Recombinant RNA viruses such as the alpha virus can be made via a cDNA intermediate using methods known in 
the art. 

In the case of vaccinia virus (for example, strain WR), the DNA sequence encoding the binding 
domains can be inserted in the genome by a number of methods including homologous recombination using a transfer 
30 vector, pTKgpt-OFIS as described in Kasfow, et at, Science 252:1310-1313 (1991). 

Alternately the DNA encoding the SABP and DABP binding domains may be inserted into another 
plasmid designed for producing recombinant vaccinia, such as pGS62, Langford, C.L, et aL, 1986, MoL Cell. Biol 
6:3191-3199. This plasmid consists of a cloning site for insertion of foreign genes, the P7.5 promoter of vaccinia 
to direct synthesis of the inserted gene, and the vaccinia TK gene flanking both ends of the foreign gene. 
35 Confirmation of production of recombinant virus can be achieved by DNA hybridization using cDNA 

encoding the DABP and SABP binding domain polypeptides and by immunodetection techniques using antibodies 
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specific for the expressed binding domain polypeptides. Virus stocks may be prepared by infection of celts such as 
HELA S3 spinner cells and harvesting of virus progeny. 

The recombinant virus of the present invention can be used to induce anti-SABP and anti-DABP 
binding domain antibodies in mammals, such as mice or humans. In addition, the recombinant virus can be used to 
5 produce the SABP and DABP binding domains by infecting host cells in vitro, which in turn express the polypeptide 
(see section on expression of SABP and OABP binding domains in eukaryotic cells, above). 

The present invention also relates to host ceOs infected with the recombinant virus. The host cells 
of the present invention are preferably mammalian, such as BSC-1 cells. Host cells infected with the recombinant 
virus express the DABP and SABP binding domains on their cell surfaces. In addition, membrane extracts of the 
10 infected cells induce protective antibodies when used to inoculate or boost previously inoculated mammals. 

D. Purification of the SABP. DABP and DBL Binding Domain Polypeptides 

The binding domain polypeptides produced by recombinant DNA technology may be purified by 
standard techniques well known to those of skill in the art. Recombinant^ produced binding domain polypeptides 
can be directly expressed or expressed as a fusion protein. The protein is then purified by a combination of cell lysis 
15 {e. g., sonication) and affinity chromatography. For fusion products, subsequent digestion of the fusion protein with 
an appropriate proteolytic enzyme release the desired SABP and DABP binding domains. 

The polypeptides of this invention may be purified to substantial purity by standard techniques 
well known in the art, including selective precipitation with such substances as ammonium sulfate, column 
chromatography, immunopurification methods, and others. See, for instance, R. Scopes, Protein Purification: 
20 Principles and Practice, Springer-Verlag, New York, NY (1982). 

L Production of Binding Domains by protein chemistry techniques 

The polypeptides of the invention can be synthetically prepared in a wide variety of ways. For 
instance polypeptides of relatively short size, can be synthesized in solution or on a solid support in accordance with 
conventional techniques. Various automatic synthesizers are commercially available and can be used in accordance 
25 with known protocols. See, for example, Stewart and Young, Solid Phase Peptide Synthesis, 2d. ed., Pierce Chemical 
Co. (1984). 

Alternatively, purified and isolated SABP, OABP or DBL family proteins may be treated with 
proteolytic enzymes in order to produce the binding domain polypeptides. For example, recombinant DABP and SABP 
proteins may be used for this purpose. The DABP and SABP protein sequence may then be analyzed to select 

30 proteolytic enzymes to be used to generate polypeptides containing desired regions of the DABP and SABP binding 
domain. The desired polypeptides are then purified by using standard techniques for protein and peptide purification. 
For a review of standard techniques see. Methods in Enzymology, "Guide to Protein Purification", M. Deutscher, ed. 
Vol.182 (1990), pages 619-626. 

F. Modification of nucleic acid and polypeptide sequences 

35 The nucleotide sequences used to transfect the host cells used for production of recombinant 

binding domain polypeptides can be modified according to standard techniques to yield binding domain polypeptides, 
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with a variety of desired properties. The binding domain polypeptides of the present invention can be readily 
designed and manufactured utilizing various recombinant DNA techniques well known to those skilled in the art. For 
example, the binding domain polypeptides can vary from the naturally-occurring sequence at the primary structure 
level by amino acid insertions, substitutions, deletions, and the like. These modifications can be used in a number 
5 of combinations to produce the final modified protein chain. 

The amino acid sequence variants can be prepared with various objectives in mind, including 
facilitating purification and preparation of the recombinant polypeptides. The modified polypeptides are also useful 
for modifying plasma half-life, improving therapeutic efficacy, and lessening the severity or occurrence of side effects 
during therapeutic use. The amino acid sequence variants are usually predetermined variants not found in nature but 

10 exhibit the same immunogenic activity as naturally occurring polypeptides. For instance, polypeptide fragments 
comprising only a portion (usually at least about 60-80%, typically 90-95%) of the primary structure may be 
produced. For use as vaccines, polypeptide fragments are typically preferred so long as at least one epitope capable 
of eliciting production of blocking antibodies remains. 

In general, modifications of the sequences encoding the binding domain polypeptides may be readily 

15 accomplished by a variety of well-known techniques, such as site-directed mutagenesis (see, Giliman and Smith, Gene 
8:81-97 {1979) and Roberts, S. at al.. Nature 328:731-734 (1987)). One of ordinary skill will appreciate that the 
effect of many mutations is difficult to predict. Thus, most modifications are evaluated by routine screening in a 
suitable assay for the desired characteristic. For instance, changes in the immunological character of the polypeptide 
can be detected by an appropriate competitive binding assay. Modifications of other properties such as redox or 

20 thermal stability, hydrophobicity, susceptibility to proteolysis, or the tendency to aggregate are all assayed according 
to standard techniques. 

G. Diagnostic and Screening Assays 

The polypeptides and nucelic acids of the invention can be used in diagnostic applications for the 
detection of merozoites or nucleic acids in a biological sample. The presence of parasites can be detected using 

25 several well recognized specific binding assays based on immunological results. (See U.S. Patents 4,366,241; 
4,376,1 10; 4,517,288; and 4,837,168). For instance, labeled monoclonal antibodies to polypeptides of the invention 
can be used to detect merozoites in a biological sample. Alternatively, labelled polypeptides of the invention can be 
used to detect the presence of antibodies to SABP or DABP in a biological sample. For a review of the general 
procedures in diagnostic immunoassays, see also Basic and CGnkal Immunology 7th Edition (0. Stites and A. Terr 

30 ed.) 1991. 

In addition, modified polypeptides, antibodies or other compounds capable of inhibiting the 
interaction between SABP or OABP and erythrocytes can be assayed for biological activity. For instance, 
polypeptides can be recombinantly expressed on the surface of cells and the ability of the cells to bind erythrocytes 
can be measured as described below. Alternatively, peptides or antibodies can tested for the ability to inhibit binding 
35 between erythrocytes and merozoites or SABP and DABP. 
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Cell-free assays can also be used to measure binding of DABP or SABP polypeptides to isolated Duffy 
antigen or gfycophorin polypeptides. For instance, the erythrocyte proteins can be immobilized on a solid surface and 
binding of labelled SABP or DABP polypeptides can be measured. 

Many assay formats employ labelled assay components. The labelling systems can be in a variety of forms. 
5 The label may be coupled directly or indirectly to the desired component of the assay according to methods wed 
known in the art. A wide variety of labels may be used. The component may be labelled by any one of several 
methods. The most common method of detection is the use of autoradiography with 3 H, 125 I, 35 S, M C, or ^P 
labelled compounds or the like. Non-radioactive labels include Ggands which bind to labelled antibodies, fluorophores, 
chemihiminescent agents, enzymes, and antibodies which can serve as specific binding pair members for a labelled 
10 ligand. The choice of label depends on sensitivity required, ease of conjugation with the compound, stability 
requirements, and available instrumentation. 

In addition, the polypeptides of the invention can be assayed using animal models, well known to those 
of skill in the art. For P falciparum the in vivo models include Aotus $p. monkeys or chimpanzees; for P. max the 
in vivo models include Saimiri monkeys. 
15 In the case of the use nucleic acids for diagnostic purposes/standard nucleic hybridization 

techniques can be used to detect the presence of the genes identified here (e.g., members of the DBL family). If 
desired, nucleic acids in the sample may first be amplified using standard procedures such as PCR. Diagnostic kits 
comprising the appropriate primers and probes can also be prepared. 
. H. DBL Targeted Therapeutics 
20 DBL polypeptides are expressed on the surface of Pfasmodiu/Th'mlecXeti erythrocytes. As such, they 

present ideal targets for therepeutics which target infected erythrocytes. In one preferred embodiement of the 
present invention, cytotoxic antibodies or antibody fusion proteins with cytotoxic agents are targeted against DBL 
proteins, killing infected erythrocytes and inhibiting the reproduciton of Plasmodium in an infected host. 

The procedure for attaching a cytotoxic agent to an antibody will vary according to the chemical 
25 structure of the agent. Antibodies and -cytotoxic agents are typically bound together chemically or, where the 
antibody and cytotoxic agents are both polypeptides, are optionally synthesized recombinantly as a fusion protein. 
Polypeptides typically contain variety of functional groups; e.g., carboxylic acid (COOH) or free amine M\IH 2 ) groups, 
which are available for reaction with a suitable functional group on either the antibody or the cytotoxic agent. 

Alternatively, antibodies or cytotoxic agents are derivrtized to attach additional reactive functional 
30 groups. The derivatization optionally involves attachment of linker molecules such as those available from Pierce 
Chemical Company, Rockford Illinois. A "linker", as used herein, is a molecule that is used to join the nucleic acid 
binding molecule to the receptor ligand. The linker is capable of forming covalent bonds to both the antibody and 
the cytotoxic agent. Suitable linkers are well known to those of skill in the art and include, but are not limited to, 
straight or branched-chain carbon linkers, heterocyclic carbon linkers, or peptide linkers. Where the antibody and the 
35 cytotoxic agent are polypeptides, the linkers are joined to the constituent amino acids through their side groups {e.g., 
through a disulfide linkage to cysteine) or to the alpha carbon amino and carboxyl groups of the terminal amino acids. 
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A bifunctional linker having one functional group reactive with a group on a particular ligand, and 
another group reactive with a nucleic acid binding molecule, can be used to form the desired conjugate. Alternatively, 
derivatization can proceed through chemical treatment of the ligand or nucleic acid binding molecule, e.g., glycol 
cleavage of the sugar moiety of a glycoprotein with periodate to generate free aldehyde groups. The free aldehyde 
5 groups on the glycoprotein may be reacted with free amine or hydrazine groups on an agent to bind the agent thereto 
[See, e.g., U.S. Patent No. 4,671,958). Procedures for generation of free sulfhydryl groups on polypeptides, are 
known [See, e.g., U.S. Pat. No. 4,659,839). 

Many procedures and linker molecules for attachment of various compounds to proteins are known. 
See, for example, European Patent Application No. 188,256; U.S. Patent Nos. 4,671,958, 4,659,839, 4,414,148, 
10 4,699,784; 4,680,338; 4,569,789; and 4,589,071; and Borlinghaus et al. Cancer Res. 47: 40714075 (1987). In 
particular, production of various antibody conjugates is well-known within the art and can be found, for example in 
Thorpe etal.. Monoclonal Antibodies in Clinical Medicine, Academic Press, pp. 168*190 (1982), Waldmann, Science, 
252: 1657 (1991), and U.S. Patent Nos. 4,545,985 and 4,894,443. 

A number of antibodies which bind cell surface receptors have been converted to form suitable 
15 for incorporation into fusion proteins, and similar strategies are used to create fusion-protein antibodies which bind 
DBR polypeptides, see Batra et al, MoL Ceff. Bio/., 11: 2200-2205 (1991); Batra et ah, Proc. Natl Acad. Sci. USA, 
89: 5867-5871 (1992); Brinkmann, etal. Proc. Natl. Acad. Sci. USA, 88: 8616-8620 (1991); Brinkmann etal., Proc. 
Natl. Acad. Sci. USA, 90: 547-551 (1993); Chaudhary etal., Proc. Natl. Acad Sci. USA, 87: 1066-1070 (1990); 
Friedman et al.. Cancer Res. 53: 334-339 (1993); Kreitman et aK 1 Immunol., 149: 2810-2815 (1992); Nicholls 
20 et al., J. Biol. Chem., 268: 5302-5308 (1993); and Wells, et aK Cancer Res., 52: 6310-6317 (1992), respectively). 
B. Production of Fusion Proteins 

Where the antibody fragment and/or the cytotoxic agents are relatively short polypeptides [i.e., 
less than about 50 amino acids) they are often synthesized using standard chemical peptide synthesis techniques. 
Where both molecules are relatively short, a chimeric molecule is optionally synthesized as a single contiguous 
25 polypeptide. Alternatively, the ligand and the nucleic acid binding molecule can be synthesized separately and then 
fused chemically. 

Solid phase synthesis in which the C-terminal amino acid of the sequence is attached to an 
insoluble support followed by sequential addition of the remaining amino acids in the sequence is a preferred method 
for the chemical synthesis of the ligands of this invention. Techniques for solid phase synthesis are described by 

30 Barany and Merrif ield, Solid-Phase Peptide Synthesis; pp. 3-284 in The Peptides: Analysis, Synthesis, Biology. Vol. 
2: Special Methods in Peptide Synthesis, Part A., Merrif ield, et al., J. Am. Chem. Soc, 85: 2149-2156 (1963), and 
Stewart et at, Solid Phase Peptide Synthesis, 2nd ed. Pierce Chem. Co., Rockford, 111. (1984). 

In a preferred embodiment, the fusion molecules of the invention are synthesized using recombinant 
nucleic acid methodology. Generally this involves creating a nucleic acid sequence that encodes the receptor-targeted 

35 fusion molecule, placing the nucleic acid in an expression cassette under the control of a particular promoter, 
expressing the protein in a host, isolating the expressed protein and, if required, renaturing the protein. Techniques 
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sufficient to guide one of skill through such procedures are found in, e.g., Berger, Sambrook, Ausubel, tnnis, and 
Freshney (all supra). 

While the two molecules are often joined directly together, one of skill will appreciate that the 
molecules may be separated by a peptide spacer consisting of one or more amino acids. Generally the spacer will 
5 have no specific biological activity other than to join the proteins or to preserve some minimum distance or other 
spatial relationship between them. However, the constituent amino acids of the spacer may be selected to influence 
some property of the molecule such as the folding, net charge, or hydrophobicity. 

Once expressed, recombinant fusion proteins can be purified according to standard procedures, 
including ammonium sulfate precipitation, affinity columns, column chromatography, gel electrophoresis and the fike 
10 (see, generally, R. Scopes, Protein Purification, Springer-Verlag, N.Y. (1982), Deutscher, Methods in Enzymology Vol 
182: Guide to Protein Purification., Academic Press, Inc. N.Y. (1990)). Substantially pure compositions of about 50 
to 95% homogeneity are preferred, and 80 to 95% or greater homogeneity are most preferred for use as therapeutic 
agents. 

One of skill in the art will recognize that after chemical synthesis, biological expression, or 
15 purification, the fusion molecule may possess a conformation substantially different than the native conformations 
of the constituent polypeptides. In this case, it is often necessary to denature and reduce the polypeptide and then 
to cause the polypeptide to re-fold into the preferred conformation. Methods of reducing and denaturing proteins 
and inducing re-folding are well known to those of skill in the art (See, Debinski et al J. Biol Chem., 268: 14065- 
14070 (1993); Kreitman and Pastan, Bioconjug. Chem., 4: 581-565 (1993); and Buchner, et al, Anal. Biochem., 205: 
20 263-270 (1992). 

I. Pharmaceutical compositions comprising binding domain polypeptides 
The polypeptides of the invention are useful in therapeutic and prophylactic applications for the 
treatment of malaria. Pharmaceutical compositions of the invention are suitable for use in a variety of drug delivery 
systems. Suitable formulations for use in the present invention are found in Remington's Pharmaceutical Sciences, 
25 Mack Pubfishing Company, Philadelphia, PA, 17th ed. (1985). For a brief review of methods for drug delivery, see, 
Langer, Science 249:1 527-1533 (1990). 

The polypeptides of the present invention can be used in pharmaceutical and vaccine compositions 
that are useful for administration to mammals, particularly humans. The polypeptides can be administered together 
in certain circumstances, e.g. where infection by both P. falciparum and P. max is likely. Thus, a single 
30 • pharmaceutical composition can be used for the treatment or prophylaxis of malaria caused by both parasites. 

The compositions are suitable for single administrations or a series of administrations. When given 
as a series, inoculations subsequent to the initial administration are given to boost the immune response and are 
typically referred to as booster inoculations. 

The pharmaceutical compositions of the invention are intended for parenteral topical, oral or local 
35 administration. Preferably, the pharmaceutical compositions are administered parenteraily, e.g., intravenously, 
subcutaneously, intradermal^, or intramuscularly. Thus, the invention provides compositions for parenteral 
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administration that comprise a solution of the agents described above dissolved or suspended Hi an acceptable carrier, 
preferably an aqueous carrier. A variety of aqueous carriers may be used, e.g., water, buffered water, 0.4% saline, 
0.3% glycine, hyaluronic acid and the like. These compositions may be sterilized by conventional, weD known 
sterilization techniques, or may be sterile filtered. The resulting aqueous solutions may be packaged for use as is, 
5 or lyophilized, the tyophilized preparation being combined with a sterile solution prior to administration. The 
compositions may contain pharmaceutical^ acceptable auxiliary substances as required to approximate physiological 
conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting agents and the like, for 
example, sodium acetate, sodium lactate, sodium chloride, potassium chloride, calcium chloride, sorbitan monolaurate, 
triethanolam'me oleate, etc. 

1 0 For solid compositions, conventional nontoxic sofid carriers may be used which include, for example, 

pharmaceutical grades of mannrtol, lactose, starch, magnesium stearate, sodium saccharin, talcum, cellulose, glucose, 
sucrose, magnesium carbonate, and the Gke. For oral administration, a pharmaceutical^ acceptable nontoxic 
composition Is formed by incorporating any of the normally employed excipients, such as those carriers previously 
listed, and generally 10-95% of active ingredient and more preferably at a concentration of 25%-75%. 

1 5 For aerosol administration, the polypeptides are preferably supplied in finely divided form along with 

a surfactant and propellant. The surfactant must, of course, be nontoxic, and preferably soluble in the propellant. 
Representative of such agents are the esters or partial esters of fatty acids containing from 6 to 22 carbon atoms, 
such as caproic, octanoic, lauric, palmitic, stearic, Hnoleic, linolenic, olesteric and oleic acids with an aliphatic 
potyhydric alcohol or its cyclic anhydride. Mixed esters, such as mixed or natural glycerides may be employed. A 

20 carrier can also be included, as desired, as with, e.g., lecithin for intranasal delivery. 

In certain embodiments patients with malaria may be treated with SABP or DABP polypeptides 
or other specific blocking agents {e.g. monoclonal antibodies) that prevent binding of Plasmodium merozoites and 
schizonts to the erythrocyte surface. 

The amount administered to the patient will vary depending upon what is being administered, the 

25 state of the patient and the manner of administration. In therapeutic applications, compositions are administered 
to a patient already suffering from malaria in an amount sufficient to inhibit spread of the parasite through 
erythrocytes and thus cure or at least partially arrest the symptoms of the disease and its complications. An amount 
adequate to accomplish this is defined as "therapeutically effective dose." Amounts effective for this use will depend 
on the severity of the disease, the particular composition, and the weight and general state of the patient. Generally, 

30 the dose will be in the range of about Img to about 5gm per day, preferably about 100 mg per day, for a 70 kg 
patient. 

Alternatively, the polypeptides of the invention can be used prophylactically as vaccines. The 
vaccines of the invention contain as an active ingredient an immunogenicaDy effective amount of the binding domain 
polypeptide or of a recombinant virus as described herein. The immune response may include the generation of 
35 antibodies; activation of cytotoxic T lymphocytes (CTL) against cells presenting peptides derived from the peptides 
encoded by the SABP, DABP or DBL sequences of the present invention, or other mechanisms well known in the art. 
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See e.g. Paul Fundamentallmmtmotogy, Second Edition (Raven Press, New York, NY) for a description of immune 
response. Useful carriers are well known in the art, and include, for example, thyroglobulin, albumins such as human 
serum albumin, tetanus toxoid, polyamino acids such as poly(D-lysine:D-g!utamic acid), influenza, hepatitis B virus core 
protein, hepatitis B virus recombinant vaccine. The vaccines can also contain a physiologically tolerable (acceptable) 
diluent such as water, phosphate buffered saline, or saline, and further typically include an adjuvant. Adjuvants such 
as incomplete Freund's adjuvant, aluminum phosphate, aluminum hydroxide, or alum are materials well known in the 
art. 

The DNA or RNA encoding the SABP or DABP binding domains and the DBL gene family motifs 
may be introduced into patients to obtain an immune response to the polypeptides which the nucleic acid encodes. 
Wolff et. al.. Science 247: 1465-1468 (1990) which is describes the use of nucleic acids to produce expression of 
the genes which the nucleic acids encode. 

Vaccine compositions containing the polypeptides, nucleic acids or viruses of the invention are 
administered to a patient to elicit a protective immune response against the polypeptide. A "protective immune 
response" is one which prevents or inhibits the spread of the parasite through erythrocytes and thus at least partially 
prevent the symptoms of the disease and its complications. An amount sufficient to accomplish this is defined as 
an "tmmunogenically effective dose." Amounts effective for this use will depend on the composition, the manner 
of administration, the weight and general state of health of the patient, and the judgment of the prescribing 
physician. For peptide compositions, the general range for the initial immunization (that is for therapeutic or 
prophylactic administration) is from about 100 //g to about 1 gm of peptide for a 70 kg patient, followed by 
boosting dosages of from about 100 //g to about 1 gm of the polypeptide pursuant to a boosting regimen over 
weeks to months depending upon the patient's response and condition e.g. by measuring levels of parasite in the 
patient's blood. For nucleic acids, typically 30- 100 Dug of nucleic acid is injected into a 70kg patient, more typically 
about 50-1 50ug of nucleic acid is injected into a 70kg patient followed by boosting doses as appropriate. 

The following examples illustrate preferred embodiments of the invention. 
EXAMPLE 1: Identification of the ammo-terminal, cvsteine-rich region of SABP and DABP as binding 
domains far erythrocytes 

1. Exoression of the SABP bmdino domain polypeptide on the surface of Cos cells . 

To demonstrate that the amino-terminal, cysteine-rich region of the SABP protein is the sialic acid binding 
region, this region of the protein was expressed on the surface of mammalian Cos cells in vitro. This DNA sequence 
is from position 1 to position 1848 of the SABP DNA sequence (SEQ ID No 3). Polymerase chain reaction 
technology (PCR) was used to amplify this region of the SABP DNA directly from the cloned gene. 

Sequences corresponding to restriction endonuclease sites for Pvull or Apai were incorporated into 
the oligonucleotide sequence of the probes used in PCR amplification in order to facilitate insertion of the 
rCR-ampimeti regions into the pRE4 vector (see below). The specific oligonucleotides, 
5'-ATCGATCAGCTGGGAAGAAATACTTCATCT-3'(SEQID N0:17) and 5'-ATCGATGGGCCCCGAAGTTTGTTCATTATT*3' 
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(SEQ ID NO: 18) were synthesized. These oligonucleotides were used as primers to PCR-ampfify the region of the 
DNA sequence encoding the cysteine-rich amino terminal region of the SABP protein. 

PGR conditions were based on the standard described in Saiki, et al, Science 239: 487491 (1988). 
Template DNA was provided from cloned fragments of the gene encoding SABP which had been spliced and re-cloned 
5 as a single open-reading frame piece. 

The vector, pRE4, used for expression in Cos cells is shown in Figure 2. The vector has an SV40 origin 
of replication, an ampicillin resistance marker and the Herpes simplex virus glycoprotein D gene (HSV glyd) cloned 
downstream of the Rous sarcoma virus long terminal repeats (RSV LTR). Part of the extracellular domain of the HSV 
glyd gene was excised using the Pvull and Apal sites in HSV glyd. 
10 As described above, the PCR oligonucleotide primers contained the Pvull or Apal restriction sites. 

The PCR ampKfied DNA fragments obtained above were digested with the restriction enzymes Pvull and Apal and 
cloned into the Pvull and Apal sites of the vector pRE4. These constructs were designed to express regions of the 
SABP protein as chimeric proteins with the signal sequence of HSV glyd at the N-terminal end and the 
transmembrane and cytoplasmic domain of HSV glyd at the C-terminal end. The signal sequence of HSV glyd targets 
15 these chimeric proteins to the surface of Cos cells and the transmembrane segment of HSV glyd anchors these 
chimeric proteins to the Cos cell surface. 

Mammalian Cos cells were transfected with the pRE4 constructs containing the PCR-ampfifted 
SABP DNA regions, by calcium phosphate precipitation according to standard techniques. 

2. Expression of the DABP binding domain polypeptide on the surface of Cos cells. 

20 To demonstrate that the amino term'mal, cysteine-rich region of the DABP protein is the binding 

domain, this region was expressed on the surface of Cos cells. This region of the DNA sequence from position 1-975 
was first PCR-ampfifted (SEQ ID No 1 1. 

Sequences corresponding to restriction endonuclease sites for Pvull or Apal were incorporated into 
the oligonucleotide probes used for PCR amplification in order to facilitate subsequent insertion of the amplified DNA 

25 into the pRE4 vector, as described above. The oligonucleotides, 5'-TCTCGTCAGCTGACGATCTCTAGTGCTATT-3' (SEQ 
ID N0:19) and 5'-ACGAGTGGGCCCTGTCACAACTTCCT6A6T-3' (SEQ ID N0:20) were synthesized. These 
oligonucleotides were used as primers to amplify the region of the DABP DNA sequence encoding the cysteine-rich, 
amino terminal region of the DABP protein directly from the cloned DABP gene, using the same conditions described 
above. 

30 The same pRE4 vector described above in the section on expression of SABP regions m Cos ceils 

was also used as a vector for the DABP DNA regions. 

3. Binding studies with erythrocytes . 

To demonstrate their ability to bind human erythrocytes, the transfected Cos cells expressing 
binding domains from DABP and SABP were incubated with erythrocytes for two hours at 37°C in culture media 
35 (DMEM/1 0% FBS). The non-adherent erythrocytes were removed with five washes of phosphate-buffered saline and 
the bound erythrocytes were observed by light microscopy. Cos cells expressing the amino terminal cysteine-rich 
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SABP polypeptides on their surface bound untreated human erythrocytes, but did not bind neuraminidase treated 
erythrocytes, that is, erythrocytes which lack sialic acid residues on their surface. Cos cells expressing other regions 
of the SABP protein on their surface did not bind human erythrocytes. These results identified the amino-tenninal, 
cysteine-rich region of SABP as the erythrocyte binding domain and-jmlicated that the binding of Cos cells expressing 
5 these regions to human erythrocytes is specific. Furthermore, the binding of the expressed region to erythrocytes 
is identical to the binding pattern seen for the authentic SABP- 175 molecule upon binding to erythrocytes. 

Similarly, Cos cells expressing the amino-terminal cysteine-rich region of DABP on their surface 
bound Duffy-positive human erythrocytes, but did not bind Duffy-negative human erythrocytes, that is erythrocytes 
which lack the Duffy blood group antigen. Cos cells expressing other regions of the DABP protein on their surface 

10 did not bind human erythrocytes. These results identified the amino-terminal cysteine rich region of DABP as the 
erythrocyte binding domain and indicated that the binding of the Cos cells was specific* 
EXAMPLE 2: Isolation of polynucleotide sequences In the DBL family 

PJakiparvm clones and cell line used include the following. P. falciparum clones 3D7, D10, LF4/1, 
Camp/AI, SL/D6, HB3, 7G8, VI IS, T2/C6, KMWII, UG2F6, FCR3/A2 and Dd2 have been previously tabulated (Dolan, 

15 et aL (1993), MoL Bhchem. ParamtoL 61, 137-142). Line Dd2/NM1 was selected from clone Dd2 for invasion via 
a sialic acid-independent pathway (Dolan, et aL (1990), J. CSn. Invest 86, 618-624). All parasites were maintained 
in vitro by standard methods (Trager, et al. (1976), Science 193, 673-675). 

DNA and RNA Isolation and Analysis. DNA was extracted as described (Peterson, etaL (1990), 
Proc. Natl Acad. ScL USA 87, 3018-3022). Endonuclease digestion, agarose gel electrophoresis, and filter 

20 hybridizations were performed by standard methods (Sambrook, et aL, 1989). All hybridizations were at 56°C 
(Sambrook, et aL t 1989). Blots were washed for 2 min. at room temperature in 2x standard safine/phosphate/EDTA 
(SSPE) with 0.5% SDS, followed by two higher stringency washes at 50°C in 0.3xSSPE with 0.5% SDS. Parasite 
chromosomes were embedded in agarose blocks and separated by pulsed field gel electrophoresis (Dolan, et aL 
(1993), Methods. MoL Biol. 21, 319-332). RNA was isolated from cultured parasites by UC1 extraction of 

25 Catrimox-14-precipitated RNA (Dahle, et aL (1993), BioTechniques 15, 1102-1105). Agarose gel electrophoresis of 
total RNA and filter hybridizations were performed by standard methods (Sambrook, et aL, (1989). 

Oligonucleotide Primers and PCR. Primers specific for E31a used in a RT-PCR to test for 
expression of this sequence were E31aT2 (5'-AGA-CCT-CAA-TTT-CTA-AG-3') (SEQ ID N0:21) and E31aRev1 
(5'-AAT-CGC-GAG-CAT-CAT-CTG-3'j (SEQ ID N0:22). 

30 Two primers were used to amplify additional sequences from genes encoding DBL domains. These 

were designed from conserved amino acids encoded in the DBL domain of the eba-175 and E31a sequences. After 
adaptation to incorporate the most frequently-used P. falciparum codons, forward primer UNIEBP5' 
[5 r -CC(A/G)-AG{G<A)-AG(G/A)-CAA-(GfA)AA-(C/T)TA-TG-31 (SEQ ID NQ:23), based upon the amino acid sequence 
FRRQKLC, and reverse primer uNitBKr is -ULA-iaj i %[ i/UH l7K)A(A/GHA/G)AA-TTG-[AfT)GG-31 (SEQ ID N0:24), 

35 based upon the amino acid sequence PQFLRW, were synthesized. 
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RT-PCR amplifications were performed as described (Kawasaki, et al (1990), PCR Protocols, A 
Guide to Methods and AppBcatbns, eds. Iraiis, MA, Gelfand, D.H., Sninsku, J J. & White, TJ. (Academic, San 
Diego), pp. 21-27). In brief, 0.5 to 1 mg of total RNA was treated with RQ1 DNAse (Promega), phenol/chloroform 
extracted, and ethanol precipitated. The RNA was then annealed with random oligonucleotide primers and extended 
5 with Superscript reverse transcriptase (GIBCO/BRL). PCR cycfing conditions were 94°C for 10 sec, 4S°C for 15 
sec, and 72°C for 45 sec, for 30 cycles. AD PCRs were performed in an Idaho Technology air thermal cycler using 
buffer containing 2 mM Mg2+. 

PCR amplification products were separated by use of PCR Purity Plus gels and protocols (AT 
Biochem, Malvern, PA). 

10 DNA Clones and Hybridization Probes. Clone pE31a was isolated from a genomic library 

prepared from the region of chromosome 7 linked to chloroquine resistance Walker-Jonah, et at. (1992), Mol. 
Biochem. Parashol 51, 313-320. Clone pS31H (GenBank accession no. L38454), containing an insert encompassing 
that of pE31a, was cloned from a size-selected Hind III restriction digest of Dd2 genomic DNA. 

Clone pEBLel was cloned from a RT-PCR of Dd2 cDNA after amplication with primers UNIEBP5' 

15 (SEQ ID N0:23) and UNIEBP3' (SEQ ID N0:24). Clone pEBP1.2 (GenBank accession no. L38450), containing an insert 
encompassing that of pEBLel, was isolated from a Dd2 cDNA library probed with pEBLel. Z?£Z -encoding sequences 
of dblnml-4 (GenBank accession no. L38455) and dhhnmhS (GenBank accession no. L38453) were amplified by 
RT-PCR from first strand cDNA of line Dd2JNM using primers UNIEBP5' and UNIEBP3'. Sequencing was performed 
on double stranded DNA templates by standard protocols for the dideoxynucleotide method. (Sequenase; U.S. 

20 Biochemicals). 

Sequences related to the E31a sequence were detected with the 3005 bp insert of clone pS31H. 

The eta- 175 gene was detected with a PCR amplified probe consisting of the first 1825 bp of the coding sequence. 

ebt-1 sequences were detected with the 2098 bp insert of clone pEBPI.2. All probes were comparable in 

organization, each containing a region encoding at least one DBL domain and varying amounts of flanking sequence. 
25 Homology searches and alignments. Homology searches were performed with BLAST and the 

Genetics Computer Group program FASTA (Altschul, et at. (1990), J. Mol. Biol 215, 403410; Devereux, et at. 

(1984), Nucleic Acids. Res. 12(1 Pt 1, 387-395). Optimized alignments were produced with MACAW sequence 

alignment software (Schuler, et a/. (1991), Proteins. 9, 180-190). 

Multiple P. falciparum sequences encode DBL domains. Positional cloning experiments directed 
30 to P. falciparum chromosome 7 identified an ORF (E31a) encoding a DBL domain that is homologous to the domains 

found in the P. max and P. A/wiv/tt/ DABPs and the P. falciparum SABP. Figure 4 shows the reahive position of 

the E31a ORF on chromosome 7. 

The homology between the DBL domains of E31a and the erythrocyte-binding proteins is due to 

the presence of short motifs of highly conserved amino acids. These well-conserved stretches are separated by 
35 non-homologous sequences and by deletions and insertions that vary the size of the domain by greater than 60 aa. 

The typical DBL domain contains 12 or more cysteine residues and has 7 conserved tryptophan residues. Additional 
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well conserved amino acids include 4 arginines, 3 aspartates, 9 positions with aliphatic residues (alanine, isoleucine, 
leucine, or valine) and 4 with aromatic amino acids (tryptophan, phenylalanine, or tyrosine). 

Probes spanning the sequence that encodes the E31a DBL domain hybridized to multiple fragments 
within a single restriction digest and yielded bands that varied among parasite lines. The numerous distinct bands 
5 from a selection of different parasite DNAs indicated a large number of diverse but related elements. These multiple 
bands varied among different P. falciparum clones, in contrast to the well-conserved, single-copy signal obtained with 
the aba-US probe. 

Because of the numerous cross-hybridizing sequences, it seemed Okely that many of these related 
sequences would be on different chromosomes of the parasite. PFG electrophoresis of P. falciparum Dd2 

10 chromosomes and hybridization with the E31a probe identified a number of cross-hybridizing sequences on multiple 
chromosomes. A control hybridization with the eba-175 probe under identical conditions yielded a single band of 
hybridization from chromosome 7. 

RNA Analysts of DBL Elements. Sequences from E31a (pS31H insert) were used to probe RNA 
blots for corresponding transcripts. No hybridization was detected. Because it was still possible that a message 

15 of low abundance was not being detected on the RNA blot, RT-PCR was used as a means of more sensitive 
detection. For this purpose, cDNA was generated by RT from random primers annealed to DNAse-treated total RNA. 
E31a-specific oligonucleotides were then used to test for amplification from the cDNA. No amplification of the E31a 
sequence was obtained, while genomic DNA controls and amplification from cDNA by dihydrofolate 
reductase/thymidylate synthetasespecific primers yielded the expected bands. A screen of a cDNA library with E31a 

20 specific probes also failed to detect any clones hybridizing with the ORF. These results indicate that E31a is either 
a pseudogene, or is expressed in parasite strains or stages not examined in this work. 

A PCR Method to Isolate Sequences Encoding DBL Domains. The identification of short 
conserved motifs in DBL domains that otherwise have extreme diversity led to a PCR strategy using degenerate 
oligonucleotide primers designed from conserved amino acid sequences in the DBL domains. Sequences PRRQKLC 

25 and PQFLRW were judged most suitable for minimizing degeneracy while allowing amplification of expressed DBL 
sequences. After these considerations and adjustment for P. falciparum codon usage, primers UNIEBP5' and 
UNIEBP3' were synthesized. 

While some P. falciparum lines yielded similar patterns of amplified bands (e. g. Dd2 and MCamp; 
FCR3/A2 and K-1), no two separate isolates showed identical patterns, reflecting the diversity of the DBL domains 

30 in the parasite lines. A few bands of the same apparent size were present in many isolates. These included a 
consistent 490 bp product that was determined to be the eba-175 gene by its expected size and hybridization to 
a gene-specific probe. The number of discernible bands probably underestimates the number of amplrf iable sequences 
because of overlapping products of the same size and possible preferential amplification of some sequences over 
others. Nevertheless, the parasite-specific patterns in the amplified bands may provide a means to quickly type 

35 isolates and serves as a measure of parasite diversity in field samples. 
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To identify DBL -encoding sequences in RNA transcripts, the UNIEBP primers were used to amplify 
first-strand cDNAs generated from DNAse-treated RNA preparations. Amplified products from Dd2, 3D7, HB3 and 
MCAMP cDNAs had diverse sizes ranging from 400 bp to nearly 1 kb. These included a band at 480-500 bp that 
was determined to be eba-175 from its expected size and cross-hybridization to an e/tf*/7£ specific probe. Other 
5 bands were from amplification of different transcripts encoding DBL domains. Dd2-NM1 RNA, for example, yielded 
bands above the eba-175 product that included two related sequences tfbhjm1-4,dbl-nm1-5). These bands were 
found to be isolate-specific and to have features consistent with the var genes described in Example 3, below. 
Probes that detect dbf-nmf-4 and dbl-nmh5 hybridized to multiple chromosomes and aligned more closely with E31a 
than with EBA-175 or DABP. 

10 The RT-PCR amplifications also yielded a consistent band that encoded a novel DBL domain distinct 

from eba-175. A cDNA clone corresponding to this product was isolated by screening a /tgt10 Dd2 cDNA fibrary 
with a radiolabeled ebl-1 probe. Sequence from this and additional overlapping cDNA clones confirmed the conserved 
motifs of the DBL domain. The alignment of the predicted amino acid sequences showed that the DBL domain of 
ebl-1 is more similar to eba-175 than to the multicopy genes. There was, however, extensive divergence from 

15 eba-175 and other known genes outside of the amplified region. 

In contrast to the multicopy hybridization patterns of dbl-nml-4 and dbl-nmh5 f the ebl-1 sequence, 
like that of eba-175, was found to have hybridization patterns consistent with a conserved single-copy gene. Probes 
specific for ebl-1 hybridized only to chromosome 13, and restriction analysis with the enzymes Cla I, fcoRI, //Mil, 
HM I, Nsi I, Bsa I, and Spe I, all yielded bands expected from a single copy sequence. RNA blots probed with 

20 e&A /-specific sequences showed several bands of hybridization, however, corresponding to 8-9.5 kb transcripts in 
mRNA from the Dd2 and 3D7 parasites. The transcripts of different size may result from alternative start and 
termination points or from incompletely processed species containing introns. 
EXAMPLE 3: Isolation of war penes 

Parasite clones. DNA analysis and Chromosome Mapping . Parasite clones were cultivated by the methods 

25 of (Trager, et at (1976), Science 193, 673-675). DNA was extracted from parasite cultures as described (Peterson, 
et al. (1988), Proc. Natl. Acad. Set USA 85, 9114-9118) except that the DNA was as recoverd by ethanol 
precipitation rather than spoofing. Fingerprint analysis with the pC4.H32 probe was used to confirm DNA 
preparations (Dolan, et al. (1993), Mol. Bfochem. Parashol. 61, 137-142). Southern blotting to Nytran membranes 
was recommended by the manufacturer (Schleicher & SchueD, Keene, NH). PFG separation of the 14 P. falciparum 

30 chromosomes and chromosome mapping were performed as described (Wellems, et al. (1987), Cell 49. 633-642; 
Sinnis, et al. (1988); Genomics 3, 287-295). 

RNA isolation. Parasites from 200 ml mixed stage cultures (5-10% parasitemia) were released by saponin 
lysis as for DNA preparations except that the procedures were performed with ice-cold solutions. RNA was 
immediately isolated from the parasite pellet fay guanidine thiocyanate/phenokhloroform methods, recovered and 

35 treated with RNAase-free DNAse (Creedon, et al. (1994), J. Biol. Chem. 269, 16364-16370. RNA in H 2 0 was 
combined with 2 vol 100% ETOH, distributed into 2 ml vials and frozen as stock at -70°C. RNA was recovered by 
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precipitation with 0.1 vol 3M NaOAc. RNA blots were generated and probed as described (Creedon, et aL (1994), 
J. Biol Chem. 269, 1636416370). 

YAC isolation, chromosome-segment libraries and cONA libraries. Overlapping YACs spanning the 300 kb 
segment of chromosome 7 that contains the COR locus were obtained from a YAC library of a COR FCR3 parasite 
5 line de Bruin, et aL (1 992), Genomics 14, 332-339) by the procedures of Lamer, et aL (1993), Nature 361, 654-657. 
Orientation of the YACs and their overlaps were identified with probes obtained from the YAC ends by inverted PCR. 

Attempts to construct cosmid libraries and large insert (~ 10 kb) A Bbraries from high molecular 
weight P. falciparum genomic DNA yielded only rearranged clones. An alternative approach was therefore taken in 
which chromosome-segment libraries were constructed that contained small (0.5-5 kb) inserts m plasmid vectors. 
10 Plasmid libraries containing AM, H'm\\, Rsa\ and Ssp\ inserts in pCDNAII were constructed from Dd2 chromosome 
7 restriction fragments purified by pulsed-field gel (PFG) electrophoresis (Wellems, et aL (1991), Proc. NatL Acad. 
ScL USA 88, 3382-3386). A plasmid library from a 34 kb ApaYSma\ restriction fragment of YAC PfYED9 was 
constructed by the same methods. Inserts in the plasmid libraries were generally 0.54 kb. 

The vlgtlO Dd2 cONA library was prepared under contract by CloneTech Laboratories Inc. (Palo 
15 Alto, CA) from the DNAse-treated, poly A + fraction of Dd2 RNA. The cDNA was generated in two separate reactions 
using oligodT primers or random primers. Products of these reactions were combined, processed and cloned into the 
EcoRI she of ylgtlO. 1.6 x 10 6 independent recombinants were obtained and amplified. 

Isolation of overlapping clones and DNA sequencing . Plasmid clones from the chromosome-segment and 
YAC-segment libraries were picked at random and their locations were established by restriction mapping. After 
sequence data from these clones were generated, overlapping clones were isolated in a process of "chromosome 
walking" by rescreening the Bbraries with oligonucleotide probes near the ends of sequenced inserts. Sufficient 
divergence was present among repetitive elements in the sequences to allow distinction of clones and unambiguous 
assignment of overlaps (generally 50-200 bp). 

Sequencing reactions with single-strand M13 DNA (1 //g) and double-strand plasmid DNA (2-5 //g) 
were performed in 96-well polyvinyl chloride U-bottom microassay plates using a Sequenase protocol recommended 
by United States Biochemical Corp. (Cleveland, OH). Reactions were separated by 8M urea-6% polyacrylamide 
sequencing gels and exposed to Kodak BioMax MR film. Sequence data from some clones were also obtained by 
use of an ABE 373A automated DNA sequencer (Applied Biosystems Inc., Foster City, CA). Cycle sequencing 
reactions were performed using the ABI PRISM DyeDeoxy system. 

DNA sequence editing, analyses and display were performed with MacVector software (International 
Biotechnologies Inc., New HaVen, CT), BLAST (Altschul, etaL (1990), J. MoL Biol 215, 403410), Genetics Computer 
Group programs (Devereux, et aL (1984), Nucleic Acids Res. 12, 387-395) and the DNADRAW package (Shapiro, et 
aL (1986), Nucleic Acids Res. 14, 65-73) maintained at the National Institutes of Health. 

identification ot a large hvoervariable region within a chromosome 7 segment Dnked to chloroouine 
resistance . Four overlapping yeast artificial chromosomes from the P. falciparum FCR3 line were obtained that span 
the 300 kb chromosome segment finked to CQR, a segment located 300-600 kb from the telomere of chromosome 
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7. Figure 5 shows the positions of these YACs (WYEF2, PfYFE6, PfYKF8, PfYED9) relative to the chromosome map. 
In order to define the structure of this 300 kb segment, we performed comparative hybridizations to search for 
polymorphisms between parasite lines. Clones were randomly picked from chromosome segment-specific plasmid 
libraries and their inserts were hybridized against restriction digests of the YAC and parasite DNAS. Over thirty 
5 inserts were identified that recognized PfYEF2, PfYFE6 or PfYKF8 and showed a predonderance of single copy 
sequences with few polymorphisms Hinf\, Rsa\ and Ssp\ digests), consistent with prior findings that 
chromosome internal regions are largely conserved and contain a preponderance of single copy sequences. However, 
fifteen other inserts that recognized PfYED9 showed highly polymorphic sets of repetitive elements in the parasite 
DNAs. Southern analysis indicated that these polymorphic elements were part of a chromosome hypervariable region 

10 contained within the PfYED9 clone. 

Mapping and DNA sequencing of the hypervariable region scanned bv YAC PfYED9 . Single copy sequences 
detected by pE45b and pH270.5 flank the hypervariable region spanned by PfYED9 (Figure 5). The pE45b and 
pH270.5 probes were therefore used to assign large restriction fragments on the PfYED9 map and establish enzyme 
recognition sites as reference points. A detailed restriction map of the PfYED9 hypervariable region was then 

15 developed. Fifteen overlapping clones (V -T and n h n -V in Figure 5) were isolated by a chromosome walking 
approach from Dd2 chromosome subsegment Obraries (Welfems et al, supra) The inserts yielded 19.1 kb of 
continuous Dd2 sequence having predicted enzyme recognition sites in perfect accord with the PfYED9 restriction 
map. Such agreement indicates that the Dd2 and FCR3 sequences in this part of the chromosome are very similar, 
despite differences elsewhere in the genome that are evident by restriction analysis. 

20 We also obtained genomic sequence data from the 34 kb Apa\ Sma\ fragment of PfYED9. Purified 

PfYED9 DNA was cut with Sma\ to yield a 110 kb fragment, which was then isolated by PFG electrophoresis and 
digested with Apa\. The resulting 34 kb ApaVSma\ band was purified by PFG electrophoresis, digested in four 
separate reactions by AhA, Hmf\ f Rsa\ or Ssp\ and incorporated into a plasmid (PCDNAII) library. Cloned inserts from 
the library were checked for hybridization to the PfYED9 34 kb fragment, assigned to the PfYED9 map and 

25 sequenced (Figure 5). Overlapping inserts were obtained by the chromosome walking approach except for three gaps 
ft", V, "0" in Figure 5) which were closed by PCR amplification of PfYED9 DNA using primers from flanking 
sequences. The clones from PfYED9 (V- V.V, V and in Figure 5) yielded 22.2 kb of continuous DNA 

sequence that overlaps the Dd2 sequence at the "fT0" junction and has predicted restriction sites that match the 
PfYED9 map perfectly. The composite sequence from the Dd2 and PfYED9 segments is 40,171 kb. 

30 Structure of a var gene cluster and comparative analysis of predicted amino acid sequences . The 40,171 

bp sequence contains three 10-12 kb regions that have related sequences and structure. Each of these regions 
harbors a pair of ORFS. The first ORF in each pair begins with a consensus ATG start codon preceded by typical 
P. falciparum non-coding sequence of abundant A +T content. The ORFs of each pair are separated by an intervening 
AT-rich and non-coding sequence of 0.9 kb to 1.1 kb. Presence of consensus intron-exon splice junction sequences 

35 at either end of these intervening sequences and lack of a consistent translation start site in the 3' ORF indicate 
that the each pair of ORFs belongs to an individual gene having a two exon structure. This has been verified by 
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comparison of the genomic sequences to the cDNA sequence of an expressed gene (var-7\ see subsequent section). 
The three 10 kb to 12 kb regions thus contain members of a variant gene family which have coding regions of 
9.23kb (var-V, 7.99 kb (var-2) and 9.01 kb tvar-3). Predicted molecular weights of the encoded proteins are 350 
kD, 302 kD and 344 kD. respectively. 
5 The var genes are flanked by additional members of the var family in Pf YED9. Restriction analysis 

identified two additional genes that are 12-35 kb upstream of the sequenced region and are closely related to var-2 
and var-3 lvar-2c and Var-3c, Figure 5). The var genes thus have a clustered arrangement in which many individual 
members are organized in head-to-tail fashion. Between var-! and var-2 is a 5 kb DNA sequence that harbors a short 
ORF homologous to that of a repetitive element (rij) suggested to be a transposable element in P. falciparum. 

10 The deduced protein sequences of the var genes are highly diverse, yet all contain certain 

conserved motifs and common structural features. Database searches identified 2 to 4 domains within each var 
sequence that are homologous to cysteine-rich domains of SABP and DABP. In the var sequences, the first domain 
near the amino-terminus (DBL domain 1) is the most conserved of the DBL domains and has amino acid signatures 
that differentiate it from subsequent domains (e.g. consensus peptide sequences GAcAp[Y/F]rrL, 

15 CTxLARsfadlgdlVrgrdLYLG and VPTYFDYVpqylrwF). Between DBL domains 1 and 2 is another type of conserved 
domain, a cysteine-rich interdomain region (CIDR) of 300400 amino acids. The CIDR does not have all the motifs 
of a DBL domain, but it does have a region at the 3'end which is homologous to the end of the Fl DBL domain in 
SABP. The conservation evident in the sequences of DBL domain I and the CIDR suggest that these regions maintain 
important structures in the head of the variant molecule. 

20 DBL domains 2, 3 and 4 (numbering is according to var- /, the first sequence completed) have 

less discriminating signatures than domain 1, and show features of cross-alignment and variation in number that 
suggest these domains can undergo shuffling and deletion. 

DBL domain 4 is followed by a segment of variable length and a hydrophobic region that is 
encoded at the end of the first exon (exon 1). In all var sequences this hydrophobic region fits the criteria of a 

25 transmembrane segment The second exon (exon II) encodes a large (45-55 kD) conserved C-terminal sequence that 
has an acid character (predicted pi - 4.5, vs. 5.9 for the part of the protein upstream of the splice junction) and 
a cysteine content of < 1% (vs. > 4% upstream). The position of this C-terminal sequence downstream of a 
single transmembrane segment suggests that it has a cytoplasmic location. 

No consensus signal sequence was detected in the NH 2 -term'mal region of the predicted ro/ORFs. 

30 We note the presence of several motifs in the protein sequences that are known to act as ligands and receptors in 
the integrin family. These include RGD lvar-1 codons 886-88, 1992 94) and DGEA (var- f codons 2111-14). Not 
all of these motifs occur in each protein sequence and, when they do occur, their positions vary. 

Identification of var transcripts and chromosome expression sites . To identify transcribed var sequences 
we screened a ^gt10 Dd2 cDNA library with ^-containing 5jjtHII restriction fragments that had been purified from 

35 PfYED9 and radiolabeled by random hexamer priming. This screening yielded 18 clones with inserts that hybridized 
back to PfYED9. By cross-hybridization studies and DNA sequence analysis the inserts fell into two groups: group 
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I inserts that aligned with sequences of var exon I WT240, A1242. AT2M, AT2M, AT2B1. AT2M, JT295, yiT296); 
and group II inserts that aligned with sequences of var exon II MT140, ^T141, >iT142 r >4T145, ^T147, >1T148 ( 
ylTBO, vlT152). 

The full ORF of an expressed var gene War 7) was determined from AT242 and overlapping cDNA 
5 clones that were obtained by a PCR-based walking strategy. The sequence showed that var-1 has a 6.6 kb ORF 
containing XwoDBL domains, a hydrophobic transmembrane sequence and carboxy-terminal region typical of Kay genes 
(predicted molecular weight 249 kO). Comparison of var-1 with the var-1 sequence demonstrated continuity of the 
alignments at the predicted splice junction between the ORFs of exons I and II. PCR amplification of Dd2 genomic 
DNA was also performed with primers derived from the two var-1 exons. Sequence of this var-1 PCR product 

10 confirmed consensus splice sites and a 1 kb intron typical of the var genes. Transcription of var-1 was detected 
as a 7.5 kb band by RNA blot analysis. 

Chromosome mapping experiments with a wr-7-specif ic probe localized the var-1 gene to a region 
that is 600 kb from one end of Dd2 chromosome 12 (chromosome 12 has a length of 2600 kb). No hybridization 
of the var-1 probe was detected to any other Dd2 chromosome nor to any chromosomes of the HB3, 3D7 or A4 

15 parasites. Other cDNA inserts from the group I clones were also sequenced and examined for chromosome 
hybridization signals. The /IT240 cDNA insert mapped to the var-Vvar-2lvar-3 cluster on Dd2 chromosome 7 and 
its sequence matched that of var 3. The A12U, AJ2M, AT2&1, >4T288, AT2B5 and AT2BB inserts all showed 
overlapping sequences and yielded the same hybridization patterns. Chromosome sites recognized by these inserts 
included regions within two Sma\ fragments from Dd2 chromosome 7 and another from chromosome 9. We note 

20 that loss of a cytoadherence phenotype has been correlated with a chromosome 9 deletion in certain P. falciparum 
lines. 

1.8 kb to 2.4 kb RNA transcripts related to var exon II . In addition to the 7.5 kb var-1 band, a broad 1.8 
kb to 2.4 kb band was detected on RNA blots after hybridization with a probe that recognizes var exon II. 
Sequences of eight group II cDNA inserts homologous to exon II were therefore determined and aligned against the 

25 var genes. Comparative analysis of the insert sequences showed that all differed from one another in regions of 
overlap, indicating that transcription of the corresponding RNAs was from different loci. Three of the cDNA 
sequences WT140, /ITJ41 and /4T148) aligned downstream of the intron/exon II splice junction. However, five other 
cDNA inserts MT142, >1T145, >tT147, ^T150 and ^T152) had sequences that aligned upstream of the var 
intron/exon II splice site and included regions homologous to var intron sequences. In the vicinity of the splice 

30 junction, consensus splice sites occurred in three of the cDNA sequences MT142, >4TI47, ^T150) while a fourth 
sequence HT145) showed the required AG dinucleotide but not the expected pyrhnidine tract of the splice consensus. 
The part of the fifth sequence MT152) that aligned with the var intron extended upstream only to the TAG of the 
splice sequence. All five sequences lacked a consensus start codon preceded by A+T-rich non-coding DNA that is 
typical of P. falciparum translation start sites. 

35 Isolate-specific var sequences and evidence for DNA recombination in cultivated parasite clones . The 

diversity of var forms expressed by P. falciparum parasites reflects a tremendous repertoire in the var gene family. 
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This repertoire is evident in the patterns of restriction polymorphism detected by var probes as well as in the 
detection of ^specific sequences that hybridize to some parasite DNAs but not to others. The var-7 gene 
expressed by Dd2, for example, is not present in the HB3, 3D7 or A4 genomes. Such var diversity suggests that 
frequent DNA rearrangements underlie the production of antigenicaily variant types in different parasite strains. 
5 To test for DNA rearrangements in parasites cultivated in vitro, we used var sequences to probe 

restricted DNAs from Dd2 lines adapted to neuraminidase-treated erythrocytes. In one rearrangement a novel 35 
kb Bgh fragment is seen in NM1 DNA probed with the /4T142 (group II) insert. In another rearrangement a deletion 
of a 20 kb Psft band is evident in NM8 DNA probed with a var-7 sequence. Deletion of this 20 kb band was also 
detected in the Dd2/R8 subclone obtained before neuraminidase selection, indicating that the DNA rearrangement was 
10 not produced by selection in neuraminidase-treated erythrocytes. 

The above examples are provided to illustrate the invention and other variants of the invention 
encompassed by the claims will be readily apparent to one of ordinary skill in the art. 



10 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: The United States, As Represented by the 
Secretary, Department of Health and Human Services 

(ii) TITLE OF INVENTION: BINDING DOMAINS FROM PLASMODIUM VIVAX 
AND PLASMODIUM FALCIPARUM ERYTHROCYTE BINDING PROTEINS 



40 



(iii) NUMBER OF SEQUENCES: 45 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Knobbe Martens Olson & Bear 

15 (B) STREET: 620 Newport Center Drive 16th Floor 

(C) CITY: Newport Beach 

(D) STATE: California 

(E) COUNTRY: US 

(F) ZIP: 92660 

20 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

25 (D) SOFTWARE: Patent In Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

30 (C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA 

(A) APPLICATION NUMBER: US08/487826 

(B) FILING DATE: 07-JUN-1996 

35 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Israelsen, Ned 

(B) REGISTRATION NUMBER: 29,655 

(C) REFERENCE/DOCKET NUMBER: NIH12 1.0 01QPC 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 235-8550 

(B) TELEFAX: (619) 235-0176 

45 (2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4084 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

55 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium vivax 

60 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

AAGCTTTTAA AAATAGCAAC AAAATTTCGA AACATTGCCA CAAAAATTTT ATGTTTTACA 60 



TATATTTAGA TTCATACAAT TTAGGTGTAC CCTGTTTTTT GATATATGCG CTTAAATTTT 120 
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TTTTTCGCTC ATATGTTTAG TTATATGTGT AGAACAACTT GCTGAATAAA TTACGTACAC 180 
TTTCTGTTCT GAA TAATATT ACCACATACA TTTAATTTTA AATACTATGA AAGGAAAAAA 240 
CCGCTCTTTA TTTGTTCTCC TAGTTTTATT ATTGTTACAC AAGGTATCAT ATAAGGATGA 300 
TTTTTCTATG ACACTAATAA ATTATCATGA AGGAAAAAAA TATTTAATTA TACTAAAAAG 360 
AAAATTAGAA AAAGCTAATA ATCGTGATGT TTGCAATTTT TTTCTTCATT TCTCTCAGGT 420 
AAATAATGTA TTATTAGAAC GAACAATTGA AACCCTTCTA GAATGCAAAA ATGAATATGT 480 
GAAAGGTGAA AATGGTTATA AATTAGCTAA AGGACACCAC TGTGTTGAGG AAGATAACTT 54 0 
AGAACGATGG TTACAAGGAA CCAATGAAAG AAGAAGTGAG GAAAATATAA AATATAAATA 600 
TGGAGTAACG GAACTAAAAA TAAAGTATGC GCAAATGAAT GGAAAAAGAA GCAGCCGCAT 660 
TTTGAAGGAA TCAATTTACG GGGCGCATAA CTTTGGAGGC AACAGTTACA TGGAGGGAAA 72 0 
AGATGGAGGA GATAAAACTG GGGAGGAAAA AGATGGAGAA CATAAAACTG ATAGTAAAAC 780 
TGATAACGGG AAAGGTGCAA ACAATTTGGT AATGTTAGAT TATGAGACAT CTAGCAATGG 840 
CCAGCCAGCG GGAACCCTTG ATAATGTTCT TGAATTTGTG ACTGGGCATG AGGGAAATTC 900 
TCGTAAAAAT TCCTCGAATG GTGGCAATCC TTACGATATT GAT CAT AAG A AAACGATCTC 960 
TAGTGCTATT ATAAATCATG CTTTTCTTCA AAATACTGTA ATGAAAAACT GTAATTATAA 1020 
GAGAAAACGT CGGGAAAGAG ATTGGGACTG TAACACTAAG AAGGATGTTT GTATACCAGA 1080 
TCGA AGATAT CAATTATGTA TGAAGGAACT TACGAATTTG GTAAATAATA CAGACACAAA 1140 
TTTTCATAGG GATA TAAC AT TTCGAAAATT ATATTTGAAA AGGAAACTTA TTTATGATGC 1200 
TGCAGTAGAG GGCGATTTAT TACTTAAGTT GAATAACTAC AGATATAACA AAGACTTTTG 1260 
CAAGGATATA AGATGGAGTT TGGGAGATTT TGGAGATATA ATTATGGGAA CGGATATGGA 1320 
AGGCATCGGA TATTCCAAAG TAGTGGAAAA TAATTTGCGC AGCATCTTTG GAACTGATGA 1380 
AAAGGCCCAA CAGCGTCGTA AACAGTGGTG GAATGAATCT AAAGCACAAA TTTGGACAGC 1440 
AATGATGTAC TCAGTTAAAA AAAGATTAAA GGGGAATTTT ATATGGATTT GTAAATTAAA 1500 
TGTTGCGGTA AATATAGAAC CGCAGATATA TAGATGGATT CGAGAATGGG GAAGGGATTA 1560 
CGTGTCAGAA TTGCCCACAG AAGTGCAAAA ACTGAAAGAA AAATGTGATG GAAAAATCAA 1620 
TTATACTGAT AAAAAAGTAT GTAAGGTACC ACCATGTCAA AATGCGTGTA AATCATATGA 1680 
TCAATGGATA ACCAGAAAAA AAAATCAATG GGATGTTCTG TCAAATAAAT TCATAAGTGT 1740 
AAAAAACGCA GAAAAGGTTC AGACGGCAGG TATCGTAACT CCTTATGATA TACTAAAACA 1800 
GGAGTTAGAT GAATTTAACG AGGTGGCTTT TGAGAATGAA ATTAACAAAC GTGATGGTGC 1860 
ATATATTGAG TTATGCGTTT GTTCCGTTGA AGAGGCTAAA AAAAATACTC AGGAAGTTGT 1920 
GACAAATGTG GACAATGCTG CTAAATCTCA GGCCACCAAT TCAAATCCGA TAAGTCAGCC 1980 
TGTAGATAGT AGTAAAGCGG AGAAGGTTCC AGGAGATTCT ACGCATGGAA ATGTTAACAG 2040 
TGGCCAAGAT AGTTCTACCA CAGGTAAAGC TGTTACGGGG GATGGTCAAA ATGGAAATCA 2100 
GACACCTGCA GAAAGCGATG TACAGCGAAG TGATATTGCC GAAAGTGTAA GTGCTAAAAA 2160 
TGTTGATCCG CAGAAATCTG TAAGTAAAAG AAGTGACGAC ACTGCAAGCG TTACAGGTAT 2220 
TGCCGAAGCT GGAAAGGAAA ACTTAGGCGC ATCAAATAGT CGACCTTCTG AGTCCACCGT 2280 
TGAAGCAAAT AGCCCAGGTG ATGATACTGT GAACAGTGCA TCTATACCTG TAGTGAGTGG 2340 
TGAAAACCCA TTGGTAACCC CCTATAATGG TTTGAGGCAT TCGAAAGACA ATAGTGATAG 2400 
CGATGGACCT GCGGAATCAA TGGCGAATCC TGATTCAAAT AGTAAAGGTG AGACGGGAAA 2460 
GGGGCAAGAT AATGATATGG CGAAGGCTAC TAAAGATAGT AGTAATAGTT CAGATGGTAC 2520 
CAGCTCTGCT ACGGGTGATA CTACTGATGC AGTTGATAGG GAAATTAATA AAGGTGTTCC 2580 
TGAGGATAGG GATAAAACTG TAGGAAGTAA AGATGGAGGG GGGGAAGATA ACTCTGCAAA 2640 
TAAGGATGCA GCGACTGTAG TTGGTGAGGA TAGAATTCGT GAGAACAGCG CTGGTGGTAG 2700 
CACTAATGAT AGATCAAAAA ATGACACGGA AAAGAACGGG GCCTCTACCC CTGACAGTAA 2760 
ACAAAGTGAG GATGCAACTG CGCTAAGTAA AACCGAAAGT TTAGAATCAA CAGAAAGTGG 2820 
AG ATAG AACT ACTAATGATA CAACTAACAG TTTAGAAAAT AAAAATGGAG GAAAAGAAAA 2880 
GGATTTACAA AAGCATGATT TTAAAAGTAA TGATACGCCG AATGAAGAAC CAAATTCTGA 2940 
TCAAACTACA GATGCAGAAG GACATGACAG GGATAGCATC AAAAATGATA AAGCAGAAAG 3000 
GAGAAAGCAT ATGAATAAAG ATACTTTTAC GAAAAATACA AATAGTCACC ATTTAAATAG 3060 
TAATAATAAT TTGAGTAATG GAAAATTAGA TATAAAAGAA TACAAATACA GAGATGTCAA 3120 
AGCAACAAGG GAAGATATTA TATTAATGTC TTCAGTACGC AAGTGCAACA ATAATATTTC 3180 
TTTAGAGTAC TGTAACTCTG TAGAGGACAA AATATCATCG AATACTTGTT CTAGAGAGAA 3240 
AAGTAAAAAT TTATGTTGCT CAATATCGGA TTTTTGTTTG AACTATTTTG ACGTGTATTC 3300 
TTATGAGTAT CTTAGCTGCA TGAAAAAGGA ATTTGAAGAT CCATCCTACA AGTGCTTTAC 3360 
GAAAGGGGGC TTTAAAGGTA TGCAGAAAAA GATGCTGAAT AGAGAAAGGT GTTGAGTAAA 3420 
TTAAAAAGGA ATTAATTTTA GGAATGTTAT AAACATTTTT GTACCCAAAA TTCTTTTTGC 3480 
AGACAAGACT TACTTTGCCG CGGCGGGAGC GTTGCTGATA CTGCTGTTGT TAATTGCTTC 3540 
AAGGAAGATG ATCAAAAATG AGTAACCAGA AAATAAAATA AAATAACATA AAATAAAATA 3600 
AAAACTAG.™ TA^C^.TTAA AATAAAATAA AATGAGAAAT GCCTGTTAA7 SCACAGxxAA 3660 
TTCTAACGAT TCCATTTGTG AAGTTTTAAA GAGAGCACAA ATGCATAGTC ATTATGTCCA 3720 
TGCATATATA CACATATATG TACGTATATA TAATAAACGC ACACTTTCTT GTTCGTACAG 3780 
TTCTGAAGAA GCTACATTTA ATGAGTTTGA AGAATACTGT GATAATATTC ACAGAATCCC 3840 
TCTGATGCCT AACAGTAATT CAAATTTCAA GAGCAAAATT CCATTTAAAA AGAAATGTTA 3900 
CATCATTTTG CGTTTTTCTT TTTTTCTTTT TTTTTTCTTT TTTAGATATT GAACACATGC 3960 



WO 96/40766 



-33- 



PCT/US96/09508 



AGCCATCAAC CCCCCTGGAT TATTCATGAT GCTACTTTGG TAAGTAAAAG CAATTCTGAT 4020 
TGTAGTGCTG ATGTAATTTT AGTCATTTTG CTTGCTGCAA TAAACGAGAA AATATAtCAA 4080 
GCTT 4084 

5 (2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1115 amino acids 

(B) TYPE: amino acid 

10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

15 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium vivax 

20 



25 



30 



35 



40 



45 



50 



55 



60 



(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 2: 




Met 


Lys 


Gly 


Lys Asn Arg 


Ser Leu Phe Val Leu 


Leu Val Leu Leu Leu 


1 






5 


10 


15 


Leu 


His 


Lys 


Val Ser Tyr 


Lys Asp Asp Phe Ser 


lie Thr Leu lie Asn 








20 


25 


30 


Tyr 


His 


Glu 


Gly Lys Lys 


Tyr Leu lie lie Leu 


Lys Arg Lys Leu Glu 






35 




40 


45 


Lys 


Ala 


Asn 


Asn Arg Asp 


Val Cys Asn Phe Phe 


Leu His Phe Ser Gin 




50 






55 


60 


Val 


Asn 


Asn 


Val Leu Leu 


Glu Arg Thr lie Glu 


Thr Leu Leu Glu Cys 


65 






70 


75 


80 


Lys 


Asn 


Glu 


Tyr Val Lys 


Gly Glu Asn Gly Tyr 


Lys Leu Ala Lys Gly 








85 


90 


95 


His 


HIS 


Cys 


Val Glu Glu 


Asp Asn Leu Glu Arg 


Trp Leu Gin Gly Thr 








100 


105 


110 


Asn 


Glu 




7v ttt Cor ("2 "1 n 


VJXU rioil 11C J_ijro lyl 


Lys Tyr Gly Val Thr 






115 




120 


125 


Glu 


Leu 


Lys 


lie Lys Tyr 


Ala Gin Met Asn Gly 


Lys Arg Ser Ser Arg 




130 






135 


140 


He 


Leu 


Lys 


Glu Ser lie 


Tyr Gly Ala His Asn 


Phe Gly Gly Asn Ser 


145 






150 


155 


160 


Tyr 


Met 


Glu 


Gly Lys Asp 


Gly Gly Asp Lys Thr 


Gly Glu Glu Lys Asp 








165 


170 


175 


Gly 


Glu 


His 


Lys Thr Asp 


Ser Lys Thr Asp Asn 


Gly Lys Gly Ala Asn 








180 


185 


190 


Asn 


Leu 


Val 


Met Leu Asp 


Tyr Glu Thr Ser Ser 


Asn Gly Gin Pro Ala 






195 




200 


205 


Gly 


Thr 


Leu 


Asp Asn Val 


Leu Glu Phe Val Thr 


Gly His Glu Gly Asn 




210 






215 


220 


Ser 


Arg 


Lys 


Asn Ser Ser 


Asn Gly Gly Asn Pro 


Tyr Asp lie Asp His 


225 






230 


235 


240 


Lys 


Lys 


Thr 


lie Ser Ser 


Ala lie lie Asn His 


Ala Phe Leu Gin Asn 






245 


250 


255 


Thr 


Val 


Met 


Lys Asn Cys 


Asn Tyr Lys Arg Lys 


Arg Arg Glu Arg Asp 








260 


265 


270 


Tip 


Asp 


Cys 


Asn Thr Lys 


Lys Asp Val Cys lie 


Pro Asp Arg Arg Tyr 






275 




280 


285 


Gin 


Leu 


Cys 


Met Lys Glu 


Leu Thr Asn Leu Val 


Asn Asn Thr Asp Thr 












300 


Asn 


Phe 


His 


Arg Asp lie 


Thr Phe Arg Lys Leu 


Tyr Leu Lys Arg Lys 


305 






310 


315 


320 


Leu 


He 


Tyr 


Asp Ala Ala 


Val Glu Gly Asp Leu 


Leu Leu Lys Leu Asn 








325 


330 


335 


Asn 


Tyr 


Arg 


Tyr Asn Lys 


Asp Phe Cys Lys Asp 


lie Arg Trp Ser Leu 



WO 96/40766 



PCT/US96/095O8 



•34- 

340 345 350 

Gly Asp Phe Gly Asp lie lie Met Gly Thr Asp Met Glu Gly lie Gly 

355 360 365 

Tyr Ser Lys Val Val Glu Asn Asn Leu Arg Ser lie Phe Gly Thr Asp 
5 370 375 380 

Glu Lys Ala Gin Gin Arg Arg Lys Gin Trp Trp Asn Glu Ser Lys Ala 
385 390 395 400 

Gin lie Trp Thr Ala Met Met Tyr Ser Val Lys Lys Arg Leu Lys Gly 
405 410 415 

10 Asn Phe lie Trp lie Cys Lys Leu Asn Val Ala Val Asn lie Glu Pro 

420 425 430 

Gin lie Tyr Arg Trp lie Arg Glu Trp Gly Arg Asp Tyr Val Ser Glu 

435 440 445 

Leu Pro Thr Glu Val Gin Lys Leu Lys Glu Lys Cys Asp Gly Lys He 
15 450 455 460 

Asn Tyr Thr Asp Lys Lys Val Cys Lys Val Pro Pro Cys Gin Asn Ala 
465 470 475 480 

Cys Lys Ser Tyr Asp Gin Trp He Thr Arg Lys Lys Asn Gin Trp Asp 
485 490 495 

20 Val Leu Ser Asn Lys Phe He Ser Val Lys Asn Ala Glu Lys Val Gin 

500 505 510 

Thr Ala Gly He Val Thr Pro Tyr Asp He Leu Lys Gin Glu Leu Asp 

515 520 525 

Glu Phe Asn Glu Val Ala Phe Glu Asn Glu He Asn Lys Arg Asp Gly 
25 530 535 540 

Ala Tyr He Glu Leu Cys Val Cys Ser Val Glu Glu Ala Lys Lys Asn 
545 550 555 560 

Thr Gin Glu Val Val Thr Asn Val Asp Asn Ala Ala Lys Ser Gin Ala 
565 570 575 

30 Thr Asn Ser Asn Pro He Ser Gin Pro Val Asp Ser Ser Lys Ala Glu 

580 585 590 

Lys Val Pro Gly Asp Ser Thr His Gly Asn Val Asn Ser Gly Gin Asp 

595 600 605 

Ser Ser Thr Thr Gly Lys Ala Val Thr Gly Asp Gly Gin Asn Gly Asn 
35 610 615 620 

Gin Thr Pro Ala Glu Ser Asp Val Gin Arg Ser Asp He Ala Glu Ser 
625 630 635 640 

Val Ser Ala Lys Asn Val Asp Pro Gin Lys Ser Val Ser Lys Arg Ser 
645 650 655 

40 Asp Asp Thr Ala Ser Val thr Gly He Ala Glu Ala Gly Lys Glu Asn 

660 665 670 

Leu Gly Ala Ser Asn Ser Arg Pro Ser Glu Ser Thr Val Glu Ala Asn 

675 680 685 

Ser Pro Gly Asp Asp Thr Val Asn Ser Ala Ser lie Pro Val Val Ser 
45 690 695 700 

Gly Glu Asn Pro Leu Val Thr Pro Tyr Asn Gly Leu Arg His Ser Lys 
705 710 715 720 

Asp Asn Ser Asp Ser Asp Gly Pro Ala Glu Ser Met Ala Asn Pro Asp 
725 730 735 

50 Ser Asn Ser Lys Gly Glu Thr Gly Lys Gly Glu Asp Asn Asp Met Ala 

740 745 750 

Lys Ala Thr Lys Asp Ser Ser Asn Ser Ser Asp Gly Thr Ser Ser Ala 

755 760 765 

Thr Gly Asp Thr Thr Asp Ala Val Asp Arg Glu He Asn Lys Gly Val 
55 7 7 0 7 7 5 7 8 0 

Pro Glu Asp Arg Asp Lys Thr Val Gly Ser Lys Asp Gly Gly Gly Glu 
785 790 795 800 

Asp Asn Ser Ala Asn Lys Asp Ala Ala Thr Val Val Gly Glu Asp Arg 
5 05 510 815 

60 He Arg Glu Asn Ser Ala Gly Gly Ser Thr Asn Asp Arg Ser Lys Asn 

820 825 830 

Asp Thr Glu Lys Asn Gly Ala Ser Thr Pro Asp Ser Lys Gin Ser Glu 

835 840 845 

Asp Ala Thr Ala Leu Ser Lys Thr Glu Ser Leu Glu Ser Thr Glu Ser 
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850 855 860 

Gly Asp Arg Thr Thr Asn Asp Thr Thr Asn Ser Leu Glu Asn Lys Asn 
865 870 875 880 

Gly Gly Lys Glu Lys Asp Leu Gin Lys His Asp Phe Lys Ser Asn Asp 
5 885 890 895 

Thr Pro Asn Glu Glu Pro Asn Ser Asp Gin Thr Thr Asp Ala Glu Gly 

900 905 910 

His Asp Arg Asp Ser lie Lys Asn Asp Lys Ala Glu Arg Arg Lys His 
915 920 925 

10 Met Asn Lys Asp Thr Phe Thr Lys Asn Thr Asn Ser His His Leu Asn 

930 935 940 

Ser Asn Asn Asn Leu Ser Asn Gly Lys Leu Asp lie Lys Glu Tyr Lys 
945 950 955 960 

Tyr Arg Asp Val Lys Ala Thr Arg Glu Asp lie lie Leu Met Ser Ser 
15 965 970 975 

Val Arg Lys Cys Asn Asn Asn lie Ser Leu Glu Tyr Cys Asn Ser Val 

980 985 990 

Glu Asp Lys lie Ser Ser Asn Thr Cys Ser Arg Glu Lys Ser Lys Asn 
995 1000 1005 

20 Leu Cys Cys Ser lie Ser Asp Phe Cys Leu Asn Tyr Phe Asp Val Tyr 

1010 1015 1020 

Ser Tyr Glu Tyr Leu Ser Cys Met Lys Lys Glu Phe Glu Asp Pro* Ser 
1025 1030 1035 1040 

Tyr Lys Cys Phe Thr Lys Gly Gly Phe Lys lie Asp Lys Thr Tyr Phe 
25 1045 1050 1055 

Ala Ala Ala Gly Ala Leu Leu lie Leu Leu Leu lie Ala Ser Arg Lys 

1060 1065 1070 

Met lie Lys Asn Asp Ser Glu Glu Ala Thr Phe Asn Glu Phe Glu Glu 
1075 1080 1085 

30 Tyr Cys Asp Asn lie His Arg lie Pro Leu Met Pro Asn Asn lie Glu 

1090 1095 1100 

His Met Gin Pro Ser Thr Pro Leu Asp Tyr Ser 
1105 1110 1115 

35 (2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4507 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

45 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: - 

TATATATATA TATATATATA GATAATAACA TATAAATATA TTCAATGTGC ATACAATGAA 60 
ATGTAATATT AGTATATATT TTTTTGCTTC CTTCTTTGTG TTATATTTTG CAAAAGCTAG 120 
GAATGAATAT GATATAAAAG AGAATGAAAA ATTTTTAGAC GTGTATAAAG AAAAATTTAA 180 

55 TGAATTAGAT AAAAAGAAAT ATGGAAATGT TCAAAAAACT GATAAGAAAA TATTTACTTT 240 
TATAGAAAAT AAATTAGATA TTTTAAATAA TTCAAAATTT AATAAAAGAT GGAAGAGTTA 300 
TGGAACTCCA GATAATATAG ATAAAAATAT GTCTTTAATA AATAAACATA ATAATGAAGA 360 
AATGTTTAAC AACAATTATC AATCATTTTT ATCGACAAGT TCATTAATAA AGCAAAATAA 420 
A7ATG7TCCT ATTAaculti'G TAUi xuTiiTU TAliGATATTA AGTTTCCTGG ATTCTAGAAT 480 

60 TAATAATGGA AGAAATACTT CATCTAATAA CGAAGTTTTA AGTAATTGTA GGGAAAAAAG 540 
GAAAGGAATG AAATGGGATT GTAAAAAGAA AAATGATAGA AGCAACTATG TATGTATTCC 600 
TGATCGTAGA ATCCAATTAT GCATTGTTAA TCTTAGCATT ATTAAAACAT ATACAAAAGA 660 
GACCATGAAG GATCATTTCA TTGAAGCCTC TAAAAAAGAA TCTCAACTTT TGCTTAAAAA 720 
AAATGATAAC AAATATAATT CTAAATTTTG TAATGATTTG AAGAATAGTT TTTTAGATTA 780 
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TGGACATCTT GCTATGGGAA ATGATATGGA TTTTGGAGGT TATTCAACTA AGGCAGAAAA 840 
CAAAATTCAA GAAGTTTTTA AAGGGGCTCA TGGGGAAATA AGTGAACATA AAATTAAAAA 900 
TTTTAGAAAA GAATGGTGGA ATGAATTTAG AGAGAAACTT TGGGAAGCTA TGTTATCTGA 960 
GCATAAAAAT AATATAAATA ATTGTAAAAA TATTCCCCAA GAAGAATTAC AAATTACTCA 1020 
5 ATGGATAAAA GAATGGCATG GAGAATTTTT GCTTGAAAGA GATAATAGAT CAAAATTGCC 1080 
AAAAAGTAAA TGTAAAAATA ATACATTATA TGAAGCATGT GAGAAGGAAT GTATTGATCC 1140 
ATGTATGAAA TATAGAGATT GGATTATTAG AAGTAAATTT GAATGGCATA CGTTATCGAA 1200 
AGAATATGAA ACTCAAAAAG TTCCAAAGGA AAATGCGGAA AATTATTTAA TCAAAATTTC 1260 
AGAAAACAAG AATGATGCTA AAGTAAGTTT ATTATTGAAT AATTGTGATG CTGAATATTC 1320 
10 AAAATATTGT GATTGTAAAC ATACTACTAC TCTCGTTAAA AGCGTTTTAA ATGGTAACGA 1380 
CAATACAATT AAGGAAAAGC GTGAACATAT TGATTTAGAT GATTTTTCTA AATTTGGATG 1440 
TGATAAAAAT TCCGTTGATA CAAACACAAA GGTGTGGGAA TGTAAAAACC CTTATATATT 1500 
ATCCACTAAA GATGTATGTG TACCTCCGAG GAGGCAAGAA TTATGTCTTG GAAACATTGA 1560 
TAGAATATAC GATAAAAACC TATTAATGAT AAAAGAGCAT ATTCTTGCTA TTGCAATATA 1620 
15 TGAATCAAGA ATATTGAAAC GAAAATATAA GAATAAAGAT GATAAAGAAG TTTGTAAAAT 1680 
CATAAATAAA ACTTTCGCTG ATATAAGAGA TATTATAGGA GGTACTGATT ATTGGAATGA 1740 
TTTGAGCAAT AGAAAATTAG TAGGAAAAAT TAACACAAAT TCAAAATATG TTCACAGGAA 1800 
TAAAAAAAAT GATAAGCTTT TTCGTGATGA GTGGTGGAAA GTTATTAAAA AAGATGTATG 1860 
GAATGTGATA TCATGGGTAT TCAAGGATAA AACTGTTTGT AAAGAAGATG ATATTGAAAA 1920 

20 TATACCACAA TTCTTCAGAT GGTTTAGTGA ATGGGGTGAT GATTATTGCC AGGATAAAAC 1980 
AAAAATGATA GAGACTCTGA AGGTTGAATG CAAAGAAAAA CCTTGTGAAG ATGACAATTG 2040 
TAAAAGTAAA TGTAATTCAT ATAAAGAATG GATATCAAAA AAAAAAGAAG AGTATAATAA 2100 
ACAAGCCAAA CAATACCAAG AATATCAAAA AGGAAATAAT TACAAAATGT ATTCTGAATT 2160 
TAA ATCTATA AAACCAGAAG TTTATTTAAA GAAATACTCG GAAAAATGTT CTAACCTAAA 2220 

25 TTTCGAAGAT GAATTTAAGG AAGAATTACA TTCAGATTAT AAAAATAAAT GTACGATGTG 2280 
TCCAGAAGTA AAGGATGTAC CAATTTCTAT AATAAGAAAT AATGAACAAA CTTCGCAAGA 2340 
AGCAGTTCCT GAGGAAAACA CTGAAATAGC ACACAGAACG GAAACTCCAT CTATCTCTGA 2400 
AGGACCAAAA GGAAATGAAC AAAAAGAACG TGATGACGAT AGTTTGAGTA AAATAAGTGT 2460 
ATCACCAGAA AATTCAAGAC CTGAAACTGA TGCTAAAGAT ACTTCTAACT TGTTAAAATT 2520 

30 AAAAGGAGAT GTTGATATTA GTATGCCTAA AGCAGTTATT GGGAGCAGTC CTAATGATAA 2580 
TATAAATGTT ACTGAACAAG GGGATAATAT TTCCGGGGTG AATTCTAAAC CTTTATCTGA 2640 
TGATGTACGT CCAGATAAAA AGGAATTAGA AGATCAAAAT AGTGATGAAT OGGAAGAAAC 2700 
TGTAGTAAAT CATATATCAA AAAGTCCATC TATAAATAAT GGAGATGATT CAGGCAGTGG 2760 
AAGTGCAACA GTGAGTGAAT CTAGTAGTTC AAATACTGGA TTGTCTATTG ATGATGATAG 2820 

35 AAATGGTGAT ACATTTGTTC GAACACAAGA TACAGCAAAT ACTGAAGATG TTATTAGAAA 2880 
AGAAAATGCT GACAAGGATG AAGATGAAAA AGGCGCAGAT GAAGAAAGAC ATAGTACTTC 2940 
TGAAAGCTTA AGTTCACCTG AAGAAAAAAT GTTAACTGAT AATGAAGGAG GAAATAGTTT 3000 
AAATCATGAA GAGGTGAAAG AACATACTAG TAATTCTGAT AATGTTCAAC AGTCTGGAGG 3060 
AATTGTTAAT ATGAATGTTG AGAAAGAACT AAAAGATACT TTAGAAAATC CTTCTAGTAG 3120 

40 CTTGGATGAA GGAAAAGCAC ATGAAGAATT ATCAGAACCA AATCTAAGCA GTGACCAAGA 3180 
TATGTCTAAT ACACCTGGAC CTTTGGATAA CACCAGTGAA GAAACTACAG AAAGAATTAG 3240 
TAATAATGAA TATAAAGTTA ACGAGAGGGA AGATGAGAGA ACGCTTACTA AGGAATATGA 3300 
AGATATTGTT TTGAAAAGTC ATATGAATAG AGAATCAGAC GATGGTGAAT TATATGAOGA 3360 
AAATTCAGAC TTATCTACTG TAAATGATGA ATCAGAAGAC GCTGAAGCAA AAATGAAAGG 3420 

45 AAATGATACA TCTGAAATGT CGCATAATAG TAGTCAACAT ATTGAGAGTG ATCAACAGAA 3480 
AAACGATATG AAAACTGTTG GTGATTTGGG AACCACACAT GTACAAAACG AAATTAGTGT 3540 
TCCTGTTACA GGAGAAATTG ATGAAAAATT AAGGGAAAGT AAAGAATCAA AAATTCATAA 3600 
GGCTGAAGAG GAAAGATTAA GTCATACAGA TATACATAAA ATTAATCCTG AAGATAGAAA 3660 
TAGTAATACA TTACATTTAA AAGATATAAG AAATGAGGAA AACGAAAGAC ACTTAACTAA 3720 

50 TCAAAACATT AATATTAGTC AAGAAAGGGA TTTGCAAAAA CATGGATTCC ATACCATGAA 3780 
TAATCTACAT GGAGATGGAG TTTCCGAAAG AAGTCAAATT AATCATAGTC ATCATGGAAA 3840 
CAGACAAGAT CGGGGGGGAA ATTCTGGGAA TGTTTTAAAT ATGAGATCTA ATAATAATAA 3900 
TTTTAATAAT ATTCCAAGTA GATATAATTT ATATGATAAA AAATTAGATT TAGATCTTTA 3960 
TGAAAACAGA AATGATAGTA CAACAAAAGA ATTAATAAAG AAATTAGCAG AAATAAATAA 4020 

55 ATGTGAGAAC GAAATTTCTG TAAAATATTG TGACCATATG ATTCATGAAG AAATCCCATT 4080 
AAAAACATGC ACTAAAGAAA AAACAAGAAA TCTGTGTTGT GCAGTATCAG ATTACTGTAT 4140 
GAGCTATTTT ACATATGATT CAGAGGAATA TTATAATTGT ACGAAAAGGG AATTTGATGA 4200 
TCCATCTTAT ACATGTTTCA GAAAGG AGGC TTTTTCAAGT ATGATATTCA AATTTTTAAT 4260 
AACA*J ITAA A ATATATT ATT ATTTTTAT AC TTACAAAACT GCAAAAGTAA CAATAAAAAA 4320 

60 AATTAATTTC TCATTAATTT TTTTTTTCTT TTTTTCTTTT TAGGTATGCC ATATTATGCA 4380 
GGAGCAGGTG TGTTATTTAT TATATTGGTT ATTTTAGGTG CTTCACAAGC CAAATATCAA 4440 
AGGTTAGAAA AAATAAATAA AAATAAAATT GAGAAGAATG TAAATTAAAT ATAGAATTCG 4500 
AGCTCGG 4507 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 1435 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met 


Lys 


Cys 


Asn 


He 


Ser 


He 


Tyr 


Phe 


Phe 


Ala 


Ser 


Phe 


Phe 


Val 


Leu 


l 








5 










10 










15 




Tyr 


Phe 


Ala 


Lys 


Ala 


Ara 


Asn 


Glu 


Tyr 


Asp 


He 


Lvs 


Glu 


Asn 


Glu 


Lys 








20 










25 










30 




Phe 


Leu 


Asp 


Val 


Tvr 


LVS 


Glu 


Lvs 


Phe 


Asn 


Glu 


Leu 


Asp 


Lys 


Lys 


Lys 






35 










40 










45 


Tyr 


Gly 
50 


Asn 


Val 


Gin 


Lys 


Thr 
55 


Asp 


Lvs 


Lvs 


He 


Phe 
60 


Thr 


Phe 


He 


Glu 


Asn 


Lys 


Leu 


Asp 

XT 


He 


Leu 


Asn 


Asn 


Ser 


Lvs 


Phe 


Asn 


Lvs 


Arcr 


Tra 


Lys 


65 










70 










75 






80 


Ser 


Tyr 


Gly 


Thr 


Pro 
85 


Asp 


Asn 


He 


Asp 


Lvs 
90 


Asn 


Met 


Ser 


Leu 


He 
95 


Asn 


Lys 


His 


Asn 


Asn 


Glu 


Glu 


Met 


Phe 


Asn 


Asn 


Asn 


Tvr 


Gin 


Ser 


Phe 


Leu 








100 










105 








110 






Ser 


Thr 


Ser 
115 


Ser 


Leu 


He 


Lys 


Gin 
120 


Asn 


Lys 


Tyr 


Val 


Pro 
125 


He 


Asn 


Ala 


Val 


Arg 


Val 


Ser 


Arg 


He 


Leu 


Ser 


Phe 


Leu 


Asp 


Ser 


Arg 


He 


Asn 


Asn 




130 










135 










140 








Gly 


Arg 


Asn 


Thr 


Ser 


Ser 


Asn 


Asn 


Glu 


Val 


Leu 


Ser 


Asn 


Cys 


Arg 


Glu 


145 










150 










155 






160 


Lys 


Arg 


Lys 


Gly 


Met 
165 


Lys 


Trp 


Asp 


Cys 


Lys 
170 


Lys 


Lys 


Asn 


Asp 


Arg 
175 


Ser 


Asn 


Tyr 


Val 


Cys 
180 


He 


Pro 


Asp 


Arg 


Arg 
185 


He 


Gin 


Leu 


Cys 


He 
190 


Val 


Asn 


Leu 


Ser 


He 


He 


Lys 


Thr 


Tyr 


Thr 


Lys 


Glu 


Thr 


Met 


Lys 


Asp 


His 


Phe 






195 










200 










205 






He 


Glu 


Ala 


Ser 


Lys 


Lys 


Glu 


Ser 


Gin 


Leu 


Leu 


Leu 


Lys 


Lys 


Asn 


Asp 




210 










215 










220 








Asn 


Lys 


Tyr 


Asn 


Ser 


Lys 


Phe 


Cys 


Asn 


Asp 


Leu 


Lys 


Asn 


Ser 


Phe 


Leu 


225 










230 










235 










240 


Asp 


Tyr 


Gly 


His 


Leu 
245 


Ala 


Met 


Gly 


Asn 


Asp 
250 


Met 


Asp 


Phe 


Gly 


Gly 
255 


Tyr 


Ser 


Thr 


Lys 


Ala 
260 


Glu 


Asn 


Lys 


He 


Gin 
265 


Glu 


Val 


•Phe 


Lys 


Gly 
270 


Ala 


His 


Gly 


Glu 


He 
275 


Ser 


Glu 


His 


Lys 


He 
280 


Lys 


Asn 


Phe 


Arg 


Lys 
285 


Glu 


Trp 


Trp 


Asn 


Glu 


Phe 


Arg 


Glu 


Lys 


Leu 


Trp 


Glu 


Ala 


Met 


Leu 


Ser 


Glu 


His 


Lys 




290 










295 










300 








Asn 


Asn 


He 


Asn 


Asn 


Cys 


Lys 


Asn 


He 


Pro 


Gin 


Glu 


Glu 


Leu 


Gin 


He 


305 










310 










315 










320 


Thr 


Gin 


Trp 


He 


Lys 

325 


Glu 


Trp 


His 


Gly 


Glu 
330 


Phe 


Leu 


Leu 


Glu 


Arg 


Asp 


Asn 


Arg 


Ser 


Lys 
340 


Leu 


Pro 


Lys 


S r 


Lys 
345 


Cys 


Lys 


Asn 


Asn 


Thr 
350 


Leu 


Tyr 


Glu 


Ala 


Cys 
355 


Glu 


Lys 


Glu 


Cys 


He 
360 


Asp 


Pro 


Cys 


Met 


Lys 
365 


Tyr 


Arg 


Asp 


Trp 


He 


He 


Arg 


Ser 


Lys 


Phe 


Glu 


Trp 


His 


Thr 


Leu 


Ser 


Lys 


Glu 


Tyr 
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370 375 380 

Glu Thr Gin Lys Val Pro Lys Glu Asn Ala Glu Asn Tyr Leu He Lys 
385 390 395 400 

He Ser Glu Asn Lys Asn Asp Ala Lys Val Ser Leu Leu Leu Asn Asn 

405 410 415 

Cys Asp Ala Glu Tyr Ser Lys Tyr Cys Asp Cys Lys His Thr Thr Thr 

420 425 430 

Leu Val Lys Ser Val Leu Asn Gly Asn Asp Asn Thr He Lys Glu Lys 

435 440 445 

Arg Glu His He Asp Leu Asp Asp Phe Ser Lys Phe Gly Cys Asp 'Lys 

450 455 460 

Asn Ser Val Asp Thr Asn Thr Lys Val Trp Glu Cys Lys Asn Pro Tyr 
465 470 475 480 

He Leu Ser Thr Lys Asp Val Cys Val Pro Pro Arg Arg Gin Glu Leu 
15 485 490 495 

Cys Leu Gly Asn He Asp Arg He Tyr Asp Lys Asn Leu Leu Met He 

500 505 510 

Lys Glu His He Leu Ala He Ala He Tyr Glu Ser Arg He Leu Lys 
515 520 525 

20 Arg Lys Tyr Lys Asn Lys Asp Asp Lys Glu Val Cys Lys lie He Asn 
530 535 540 

Lys Thr Phe Ala Asp He Arg Asp He He Gly Gly Thr Asp Tyr Trp 
545 550 555 560 

Asn Asp Leu Ser Asn Arg Lys Leu Val Gly Lys He Asn Thr Asn Ser 
25 565 570 575 

Lys Tyr Val His Arg Asn Lys Lys Asn Asp Lys Leu Phe Arg Asp Glu 

580 585 590 

Trp Trp Lys Val He Lys Lys Asp Val Trp Asn Val He Ser Trp Val 
595 600 605 

30 Phe Lys Asp Lys Thr Val Cys Lys Glu Asp Asp He Glu Asn He Pro 
610 615 620 

Gin Phe Phe Arg Trp Phe Ser Glu Trp Gly Asp Asp Tyr Cys Gin Asp 
625 630 635 640 

Lys Thr Lys Met He Glu Thr Leu Lys Val Glu Cys Lys Glu Lys Pro 
35 645 650 655 

Cys Glu Asp Asp Asn Cys Lys Ser Lys Cys Asn Ser Tyr Lys Glu Trp 

660 665 670 

He Ser Lys Lys Lys Glu Glu Tyr Asn Lys Gin Ala Lys Gin Tyr Gin 
675 680 685 

40 Glu Tyr Gin Lys Gly Asn Asn Tyr Lys Met Tyr Ser Glu Phe Lys Ser 
690 695 700 

He Lys Pro Glu Val Tyr Leu Lys Lys Tyr Ser Glu Lys Cys Ser Asn 
705 710 715 720 

Leu Asn Phe Glu Asp Glu Phe Lys Glu Glu Leu His Ser Asp Tyr Lys 
45 725 730 735 

Asn Lys Cys Thr Met Cys Pro Glu Val Lys Asp Val Pro He Ser He 

740 745 750 

He Arg Asn Asn Glu Gin Thr Ser Gin Glu Ala Val Pro Glu Glu Asn 

755 760 765 

Thr Glu He Ala His Arg Thr Glu Thr Pro Ser He Ser Glu Gly Pro 

770 775 780 

Lys Gly Asn Glu Gin Lys Glu Arg Asp Asp Asp Ser Leu Ser Lys He 
785 790 795 800 

Ser Val Ser Pro Glu Asn Ser Arg Pro Glu Thr Asp Ala Lys Asp Thr 
55 805 810 815 

Ser Asn Leu Leu Lys Leu Lys Gly Asp Val Asp He Ser Met Pro Lys 

820 825 830 

Ala Val lie Gly Ser Ser Pro Asn Asp Asn He Asn Val Thr Glu Gin 
335 540 845 

60 Gly Asp Asn He Ser Gly Val Asn Ser Lys Pro Leu Ser Asp Asp Val 
850 855 860 

Arg Pro Asp Lys Lys Glu Leu Glu Asp Gin Asn Ser Asp Glu Ser Glu 
865 870 875 880 

Glu Thr Val Val Asn His He Ser Lys Ser Pro Ser He Asn Asn Gly 



SO 
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885 890 895 

Asp Asp Ser Gly Ser Gly Ser Ala Thr Val Ser Glu Ser Ser Ser Ser 

900 905 910 

Asn Thr Gly Leu Ser lie Asp Asp Asp Arg Asn Gly Asp Thr Phe Val 
5 915 920 925 

Arg Thr Gin Asp Thr Ala Asn Thr Glu Asp Val lie Arg Lys Glu Asn 

930 935 940 

Ala Asp Lys Asp Glu Asp Glu Lys Gly Ala Asp Glu Glu Arg His Ser 
945 950 955 960 

10 Thr Ser Glu Ser Leu Ser Ser Pro Glu Glu Lys Met Leu Thr Asp Asn 

965 970 975 

Glu Gly Gly Asn Ser Leu Asn His Glu Glu Val Lys Glu His Thr Ser 

980 985 990 

Asn Ser Asp Asn Val Gin Gin Ser Gly Gly lie Val Asn Met Asn Val 
15 995 1000 1005 

Glu Lys Glu Leu Lys Asp Thr Leu Glu Asn Pro Ser Ser Ser Leu Asp 

1010 1015 1020 

Glu Gly Lys Ala His Glu Glu Leu Ser Glu Pro Asn Leu Ser Ser Asp 
1025 1030 1035 1040 

20 Gin Asp Met Ser Asn Thr Pro Gly Pro Leu Asp Asn Thr Ser Glu Glu 

1045 1050 1055 

Thr Thr Glu Arg lie Ser Asn Asn Glu Tyr Lys Val Asn Glu Arg Glu 

1060 1065 1070 

Asp Glu Arg Thr Leu Thr Lys Glu Tyr Glu Asp lie Val Leu Lys Ser 
25 1075 1080 1085 

His Met Asn Arg Glu Ser Asp Asp Gly Glu Leu Tyr Asp Glu Asn Ser 

1090 1095 1100 

Asp Leu Ser Thr Val Asn Asp Glu Ser Glu Asp Ala Glu Ala Lys Met 
1105 1110 1115 1120 

30 Lys Gly Asn Asp Thr Ser Glu Met Ser His Asn Ser Ser Gin His lie 

1125 1130 1135 

Glu Ser Asp Gin Gin Lys Asn Asp Met Lys Thr Val Gly Asp Leu Gly 

1140 1145 1150 

Thr Thr His Val Gin Asn Glu lie Ser Val Pro Val Thr Gly Glu lie 
35 1155 1160 1165 

Asp Glu Lys Leu Arg Glu Ser Lys Glu Ser Lys lie His Lys Ala Glu 

1170 1175 1180 

Glu Glu Arg Leu Ser His Thr Asp lie His Lys lie Asn Pro Glu Asp 
1185 1190 1195 1200 

40 Arg Asn Ser Asn Thr Leu His Leu Lys Asp lie Arg Asn Glu Glu Asn 

1205 1210 1215 

Glu Arg His Leu Thr Asn Gin Asn lie Asn lie Ser Gin Glu Arg Asp 

1220 1225 1230 

Leu Gin Lys His Gly Phe His Thr Met Asn Asn Leu His Gly Asp Gly 
45 1235 1240 1245 

Val Ser Glu Arg Ser Gin lie Asn His Ser His His Gly Asn Arg Gin 

1250 1255 1260 

Asp Arg Gly Gly Asn Ser Gly Asn Val Leu Asn Met Arg Ser Asn Asn 
1265 1270 1275 1280 

50 Asn Asn Phe Asn Asn lie Pro Ser Arg Tyr Asn" Leu Tyr Asp Lys Lys 

1285 1290 1295 

Leu Asp Leu Asp Leu Tyr Glu Asn Arg Asn Asp Ser Thr Thr Lys Glu 

1300 1305 1310 

Leu lie Lys Lys Leu Ala Glu lie Asn Lys Cys Glu Asn Glu lie Ser 
55 13 1 5 13 2 0 13 2 5 

Val Lys Tyr Cys Asp His Met lie His Glu Glu lie Pro Leu Lys Thr 

1330 1335 1340 

Cys Thr Lys Glu Lys Thr Arg Asn Leu Cys Cys Ala Val Ser Asp Tyr 
1345 1350 1355 1360 

60 Cys Met Ser Tyr Phe Thr Tyr Asp Ser Glu Glu Tyr Tyr Asn Cys Thr 

1365 1370 1375 

Lys Arg Glu Phe Asp Asp Pro Ser Tyr Thr Cys Phe Arg Lys Glu Ala 

1380 1385 1390 

Phe Ser Ser Met lie Phe Lys Phe Leu lie Thr Asn Lys lie Tyr Tyr 
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1395 1400 1405 

Tyr Phe Tyr Thr Tyr Lys Thr Ala Lys Val Thr lie Lys Lys He Asn 

1410 1415 1420 

Phe Ser Leu He Phe Phe Phe Phe Phe Ser Phe 
1425 1430 1435 



10 



(2) INFORMATION FOR SBQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
15 (D) TOPOLOGY: linear 



20 



25 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 



CACTTTATGC TTCCGGCTCG TATGTTGTGT GGAATTGTGA GCGGATAACA ATTTCACACA 60 
GGAAACAGCT ATGACCATGA TTACGCCAAG CTCTAATACG ACTCACTATA GGGAAAGCTG i20 
GTACGCCTGC AGGTCCGGTC CGGAATTCAA TAAAATATTT CCAGAAAGGA ATGTGCAAAT 180 
TCACATATCC AATATATTCA AGGAATATAA AGAAAATAAT GTAGATATCA TATTTGGAAC 240 

30 GTTGAATTAT GAATATAATA ATTTCTGTAA AGAAAAACCT GAATTAGTAT CTGCTGCCAA 300 
GTATAATCTG AAAGCTCCAA ATGCTAAATC CCCTAGAATA TACAAATCTA AGGAGCATGA 360 
AGAATCAAGT GTGTTTGGTT GCAAAACGAA AATCAGTAAA GTTAAAAAAA AATGGAATTG 420 
TTATAGTAAT AATAAAGTAA CTAAACCTGA AGGTGTATGT GGACCACCAA GAAGGCAACA 480 
ATTATGTCTT GGATATATAT TTTTGATTCG CGACGGTAAC GAGGAAGGAT TAAAAGATCA 540 

35 TATTAATAAG GCAGCTAATT ATGAGGCAAT GCATTTAAAA GAGAAATATG AGAATGCTGG 600 
TGGTGATAAA ATTTGCAATG CTATATTGGG AAGTTATGCA GATATTGGAG ATATTGTAAG 660 
AGGTTTGGAT GTTTGGAGGG ATATAAATAC TAATAAATTA TCAGAAAAAT TCCAAAAAAT 720 
TTTTATGGGT GGTGGTAATT CTAGGAAAAA ACAAAACGAT AATAATGAAC GTAATAAATG 780 
GTGGGAAAAA CAAAGGAATT TAATATGGTC TAGTATGGTA AAACACATTC CAAAAGGAAA 840 

40 AACATGTAAA CGT CATA ATA ATTTTGAGAA AATTCCTCAA TTTTTGAGAT GGTTAAAAGA 900 
ATGGGGTGAT GAATTTTGTG AGGAAATGGG TACGGAAGTC AAGCAATTAG AGAAAATATG 960 
TGAAAATAAA AATTGTTCGG AAAAAAAATG TAAAAATGCA TGTAGTTCCT ATGAAAAATG 1020 
GATAAAGGAA CGAAAAAATG AATATAATTT GCAATCAAAG AAATTTGATA GTGATAAAAA 1080 
ATTAAATAAA AAAAACAATC TTTATAATAA ATTTGAGGAT TCTAAAGCTT ATTTAAGGAG 1140 

45 TGAATCAAAA CAGTGCTCAA ATATAGAATT TAATGATGAA ACATTTACAT TTCCTAATAA 1200 
ATATAAAGAG GCTTGTATGG TATGTGAAAA TCCTTCATCT TCGAAAGCTC TTAAACCTAT 1260 
AAAAACGAAT GTGTTTCCTA TAGAGGAATC AAAAAAATCT GAGTTATCAA GTTTAACAGA 1320 
TAAATCTAAG AATACTCCTA ATAGTTCTGG TGGGGGAAAT TATGGAGATA GACAAATATC 1380 
AAAAAGAGAC GATGTTCATC ATGATGGTCC TAAGGAAGTG AAATCCGGAG AAAAAGAGGT 1440 

50 ACCAAAAATA GATGCAGCTG TTAAAACAGA AAATGAATTT ACOTCTAATC GAAACGATAT 1500 
TGAAGGAAAG GAAAAAAGTA AAGGTGATCA TTCTTCTCCT GTTCATTCTA AAGATATAAA 1560 
AAATGAGGAA CCACAAAGGG TGGTGTCTGA AAATTTACCT AAAATTGAAG AGAAAATGGA 1620 
ATCTTCTGAT TCTATACCAA TTACTCATAT AGAAGCTGAA AAGGGTCAGT CTTCTAATTC 1680 
TAGCGATAAT GATCCTGCAG TAGTAAGTGG TAGAGAATCT AAAGATGTAA ATCTTCATAC 1740 

55 TTCTGAAAGG ATTAAAGAAA ATGAAGAAGG TGTGATTAAA ACAGATGATA GTTCAAAAAG 1800 
TATTGAAATT TCTAAAATAC CATCTGACCA AAATAATCAT AGTGATTTAT CACAGAATGC 1860 
AAATGAGGAC TCTAATCAAG GGAATAAGGA AACAATAAAT CCTCCTTCTA CAGAAAAAAA 1920 
TCTCAAAGAA ATTCATTATA AAACATCTGA TTCTGATGAT CATGGTTCTA AAATTAAAAG 1980 

^» »* » a ^nnwvaunvTutn n J. iLll 1 Va/i irtftrtrt AAALTViAAAU ^ U 4 U 

60 TGCAGCGATT GGTGATAAAA ATCATGAATC AGTAAAAAGC GCTGATATTT TTCAATCTGA 2100 
GATTCATAAT TCTGATAATA GAGATAGAAT TGTTTCTGAA AGTGTAGTTC AGGATTCTTC 2160 
AGGAAGCTCT ATGAGTACTG AATCTATACG TACTGATAAC AAGGATTTTA AAACAAGTGA 2220 
GGATATTGCA CCTTCTATTA ATGGTCGGAA TTCCCGGGTC GACGAGCTCA CTAGTCGGCG 2280 
GCCGCTCT 2288 
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15 



20 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 749 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Ala Asp Asn Asn Phe Thr Gin Glu Thr Ala Met Thr Met lie Thr Pro 
15 10 15 

Ser Ser Asn Thr Thr His Tyr Arg Glu Ser Trp Tyr Ala Cys Arg Ser 

20 25 30 

Gly Pro Glu Phe Asn Lys lie Phe Pro Glu Arg Asn Val Gin lie His 

35 40 45 

lie Ser Asn lie Phe Lys Glu Tyr Lys Glu Asn Asn Val Asp lie lie 
25 5 0 5 5 6 0 

Phe Gly Thr Leu Asn Tyr Glu Tyr Asn Asn Phe Cys Lys Glu Lys Pro 
65 70 75 80 

Glu Leu Val Ser Ala Ala Lys Tyr Asn Leu Lys Ala Pro Asn Ala Lys 
85 90 95 

30 Ser Pro Arg lie Tyr Lys Ser Lys Glu His Glu Glu Ser Ser Val Phe 

100 105 no 

Gly Cys Lys Thr Lys lie Ser Lys Val Lys Lys Lys Trp Asn Cys Tyr 

115 120 125 

Ser Asn Asn Lys Val Thr Lys Pro Glu Gly Val Cys Gly Pro Pro Arg 
35 130 135 140 

Arg Gin Gin Leu Cys Leu Gly Tyr lie Phe Leu lie Arg Asp Gly Asn 
145 150 155 160 

Glu Glu Gly Leu Lys Asp His lie Asn Lys Ala Ala Asn Tyr Glu Ala 
165 170 175 

40 Met His Leu Lys Glu Lys Tyr Glu Asn Ala Gly Gly Asp Lys lie Cys 

180 185 190 

Asn Ala lie Leu Gly Ser Tyr Ala Asp lie Gly Asp lie Val Arg Gly 

195 200 205 

Leu Asp Val Trp Arg Asp lie Asn Thr Asn Lys Leu Ser Glu Lys Phe 
45 2 10 2 15 2 2 0 

Gin Lys lie Phe Met Gly Gly Gly Asn Ser Arg Lys Lys Gin Asn Asp 
225 230 235 240 

Asn Asn Glu Arg Asn Lys Trp Trp Glu Lys Gin Arg Asn Leu lie Trp 
245 250 255 

50 Ser Ser Met Val Lys His lie Pro Lys Gly Lys Thr Cys Lys Arg His 

260 265 270 

Asn Ash Phe Glu Lys lie Pro Gin Phe Leu Arg Trp Leu Lys Glu Trp 

275 280 285 

Gly Asp Glu Phe Cys Glu Glu Met Gly Thr Glu Val Lys Gin Leu Glu 
55 290 295 300 

Lys lie Cys Glu Asn Lys Asn Cys Ser Glu Lys Lys Cys Lys Asn Ala 
305 310 315 320 

Cys Ser Ser Tyr Glu Lys Trp lie Lys Glu Arg Lys Asn Glu Tyr Asn 
325 330 335 

60 Leu Gin Ser Lys Lys Phe Asp Ser Asp Lys Lys Leu Asn Lys Lys Asn 

340 345 350 

Asn Leu Tyr Asn Lys Phe Glu Asp Ser Lys Ala Tyr Leu Arg Ser Glu 

355 360 365 

Ser Lys Gin Cys Ser Asn lie Glu Phe Asn Asp Glu Thr Phe Thr Phe 
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370 375 380 

Pro Asn Lys Tyr Lys Glu Ala Cys Met Val Cys Glu Asn Pro Ser Ser 
385 390 395 400 

Ser. Lys Ala Leu Lys Pro lie Lys Thr Asn Val Phe Pro lie Glu Glu 

405 410 415 

Ser Lys Lys Ser Glu Leu Ser Ser Leu Thr Asp Lys Ser Lys Asn Thr 

420 425 430 

Pro Asn Ser Ser Gly Gly Gly Asn Tyr Gly Asp Arg Gin He Ser Lys 

435 440 445 

Arg Asp Asp Val His His Asp Gly Pro Lys Glu Val Lys Ser Gly Glu 

450 455 460 

Lys Glu Val Pro Lys He Asp Ala Ala Val Lys Thr Glu Asn Glu Phe 
465 470 475 480 

Thr Ser Asn Arg Asn Asp He Glu Gly Lys Glu Lys Ser Lys Gly Asp 
15 485 490 495 

His Ser Ser Pro Val His Ser Lys Asp He Lys Asn Glu Glu Pro Gin 

500 505 510 

Arg Val Val Ser Glu Asn Leu Pro Lys He Glu Glu Lys Met Glu Ser 
515 520 525 

20 Ser Asp Ser He Pro He Thr His He Glu Ala Glu Lys Gly Gin Ser 

530 535 540 

Ser Asn Ser Ser Asp Asn Asp Pro Ala Val Val Ser Gly Arg Glu Ser 
545 550 555 560 

Lys Asp Val Asn Leu His Thr Ser Glu Arg He Lys Glu Asn Glu Glu 
25 565 570 575 

Gly Val He Lys Thr Asp Asp Ser Ser Lys Ser He Glu He Ser Lys 

580 585 590 

He Pro Ser Asp Gin Asn Asn His Ser Asp Leu Ser Gin Asn Ala Asn 
595 600 605 

30 Glu Asp Ser Asn Gin Gly Asn Lys Glu Thr He Asn Pro Pro Ser Thr 

610 615 620 

Glu Lys Asn Leu Lys Glu He His Tyr Lys Thr Ser Asp Ser Asp Asp 
625 630 635 640 

His Gly Ser Lys He Lys Ser Glu He Glu Pro Lys Glu Leu Thr Glu 
35 6 4 5 6 5 0 6 5 5 

Glu Ser Pro Leu Thr Asp Lys Lys Thr Glu Ser Ala Ala lie Gly Asp 

660 665 * 670 

Lys Asn His Glu Ser Val Lys Ser Ala Asp He Phe Gin Ser Glu lie 
675 680 685 

40 His Asn Ser Asp Asn Arg Asp Arg He Val Ser Glu Ser Val Val Gin 

690 695 700 

Asp Ser Ser Gly Ser Ser Met Ser Thr Glu Ser He Arg Thr Asp Asn 
705 710 715 720 

Lys Asp Phe Lys Thr Ser Glu Asp He Ala Pro Ser He Asn Gly Arg 
45 725 730 735 

Asn Ser Arg Val Asp Glu Leu Thr Ser Arg Arg Pro Leu 
740 745 



50 



(2) INFORMATION FOR SEQ ID NO: 7; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2606 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
55 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
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AGCTCTATTA CGACTCACTA TAGGGAAAGC TGGTACGCCT GCAGGTACCG GTCCGGAATT 60 
CCCGGGTCGA CGAGCTCACT AGTCGGCGGC CGCTCTAGAG GATCCAAGCT TAATAGTGTT 120 
TATACGTCTA TTGGCTTATT TTTAAATAGC TTAAAAAGCG GACCATGTAA AAAGGATAAT 180 
GATAATGCAG AGGATAATAT AGATTTTGGT GATGAAGGTA AAACATTTAA AGAGGCAGAT 240 
5 AATTGTAAAC CATGTTCTCA ATTTACTGTT GATTGTAAAA ATTGTAATGG TGGTGATACA 300 
AAAGGGAAGT GCAATGGCAG CAATGGCAAA AAGAATGGAA ATGATTATAT TACTGCAAGT 360 
GATATTGAAA ATGGAGGGAA TTCTATTGGA AATATAGATA TGGTTGTTAG TGATAAGGAT 420 
GCAAATGGAT TTAATGGTTT AGACGCTTGT GGAAGTGCAA ATATCTTTAA AGGTATTAGA 480 
AAAGAACAAT GGAAATGTGC TAAAGTATGT GGTTTAGATG TATGTGGTCT TAAAAATGGT 540 
10 AATGGTAGTA TAGATAAAGA TCAAAAACAA ATTATAATTA TTAGAGCATT GCTTAAACGT 600 
TGGGTAGAAT ATTTTTTAGA AGATTATAAT AAAATTAATG CCAAAATTTC ACATTGTACG 660 
AAAAAGGATA ATGAATCCAC ATGTACAAAT GATTGTCCAA ATAAATGTAC ATGTGTAGAA 720 
GAGTGGATAA ATCAGAAAAG GACAGAATGG AAAAATATAA AAAAACATTA CAAAACACAA 780 
AATGAAAATG GTGACAATAA CATGAAATCT TTGGTTACAG ATATTTTGGG TGCCTTGCAA 840 
1 5 CCCCAAAGTG ATGTTAACAA AGCTATAAAA CCTTGTAGTG GTTTAACTGC GTTCGAGAGT 900 
TTTTGTGGTC TTAATGGCGC TGATAACTCA GAAAAAAAAG AAGGTGAAGA TTACGATCTT 960 
GTTCTATGTA TGCTTAAAAA TCTTGAAAAA CAAATTCAGG AGTGCAAAAA GAAACATGGC 1020 
GAAACTAGTG TCGAAAATGG TGGCAAATCA TGTACCCCCC TTGACAACAC CACCCTTGAG 1080 
GAGGAACCCA TAGAAGAGGA AAACCAAGTG GAAGCGCCGA ACATTTGTCC AAAACAAACA 1140 
20 GTGGAAGATA AAAAAAAAGA GGAAGAAGAA GAAACTTGTA CACCGGCATC ACCAGTACCA 1200 
GAAAAACCGG TACCTCATGT GGCACGTTGG CGAACATTTA CACCACCTGA GGTATTCAAG 1260 
ATATGGAGGG GAAGGAGAAA TAAAACTACG TGCGAAATAG TGGCAGAAAT GCTTAAAGAT 1320 
AAGAATGGAA GGACTACAGT AGGTGAATGT TATAGAAAAG AAACTTATTC TGAATGGACG 1380 
TGTGATGAAA GTAAGATTAA AATGGGACAG CATGGAGCAT GTATTCCTCC AAGAAGACAA 1440 
25 AAATTATGTT TACATTATTT AGAAAAAATA ATGACAAATA CAAATGAATT GAAATACGCA 1500 
TTTATTAAAT GTGCTGCAGC AGAAACTTTT TTGTTATGGC AAAACTACAA AAAAGATAAG 1560 
AATGGTAATG CAGAAGATCT CGATGAAAAA TTAAAAGGTG GTATTATCCC CGAAGATTTT 1620 
AAACGGCAAA TGTTCTATAC GTTTGCAGAT TATAGAGATA TATGTTTGGG TACGGATATA 1680 
TCAT CAAAAA AAGATACAAG TAAAGGTGTA GGTAAAGTAA AATGCAATAT TGATGATGTT 1740 
30 TTTTATAAAA TTAGCAATAG TATTCGTTAC CGTAAAAGTT GGTGGGAAAC AAATGGTCCA 1800 
GTTATATGGG AAGGAATGTT ATGCGCTTTA AGTTATGATA CGAGCCTAAA TAATGTTAAT 1860 
CCGGAAACTC ACAAAA AACT TACCGAAGGC AATAACAACT TTGAGAAAGT CATATTTGGT 1920 
AGTGATAGTA GCACTACTTT GTCCAAATTT TCTGAAAGAC CTCAATTTCT AAGATGGTTG 1980 
ACTGAATGGG GAGAAAATTT CTGCAAAGAA CAAAAAAAGG AGTATAAGGT GTTGTTGGCA 2040 
35 AAATGTAAGG ATTGTGATGT TGATGGTGAT GGTAAATGTA ATGGAAAATG TGTTGCGTGC 2100 
AAAGATCAAT GTAAACAATA TCATAGTTGG ATTGGAATAT GGATAGATAA TTATAAAAAA 2160 
CAAAAAGGAA GATATACTGA GGTTAAAAAA ATACCTCTGT ATAAAGAAGA TAAAGACGTG 2220 
AAAAACTCAG ATGATGCTCG CGATTATTTA AAAACACAAT TACAAAATAT GAAATGTGTA 2280 
AATGGAACTA CTGATGAAAA TTGTGAGTAT AAGTGTATGC ATAAAACCTC ATCCACAAAT 2340 
40 AGTGATATGC CCGAATCGTT GGACGAAAAG CCGGAAAAGG TCAAAGACAA GTGTAATTGT 2400 
GTACCTAATG AATGCAATGC ATTGAGTGTA AGTGGTAGCG GTTTTCCTGA TGGTCAAGCT 2460 
TACGTACGCG TGCATGCGAC GTCATAGCTC TTCTATAGTG TCACCTAAAT TCAATTCACT 2520 
GGCCGTCGTT TTACAACGTC GTGACTGGGA AAACCTGGCG TTACCCAACT TAATCGCCTT 2580 
GCAGCACATC CCCCTTTCGC CAGCTG 2606 



45 



55 



60 



(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 921 amino acids 
50 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Lys Leu Asn Ser Val Tyr Thr Ser lie Gly Leu Phe Leu Asn Ser Leu 
15 10 15 



25 
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Lys Ser Gly Pro Cys Lys Lys Asp Asn Asp Asn Ala Glu Asp Asn He 

20 25 30 

Asp Phe Gly Asp Glu Gly Lys Thr Phe Lys Glu Ala Asp Asn Cys Lys 
35 40 45 

5 Pro Cys Ser Gin Phe Thr Val Asp Cys Lys Asn Cys Asn Gly Gly Asp 

50 55 60 

Thr Lys Gly Lys Cys Asn Gly Ser Asn Gly Lys Lys Asn Gly Asn Asp 
65 70 75 80 

Tyr He Thr Ala Ser Asp He Glu Asn Gly Gly Asn Ser He Gly Asn 
10 85 90 95 

He Asp Met Val Val Ser Asp Lys Asp Ala Asn Gly Phe Asn Gly Leu 

100 105 110 

Asp Ala Cys Gly Ser Ala Asn He Phe Lys Gly He Arg Lys Glu Gin 
. . H5 120 125 

15 Trp Lys Cys Ala Lys Val Cys Gly Leu Asp Val Cys Gly Leu Lys Asn 

130 135 140 

Gly Asn Gly Ser He Asp Lys Asp Gin Lys Gin He He He He Arg 
145 150 155 160 

Ala Leu Leu Lys Arg Trp Val Glu Tyr Phe Leu Glu Asp Tyr Asn Lys 
20 165 170 175 

He Asn Ala Lys He Ser His Cys Thr Lys Lys Asp Asn Glu Ser Thr 

180 185 190 

Cys Thr Asn Asp Cys Pro Asn Lys Cys Thr Cys Val Glu Glu Trp He 

195 200 205 

Asn Gin Lys Arg Thr Glu Trp Lys Asn He Lys Lys His Tyr Lys Thr 

210 215 220 

Gin Asn Glu Asn Gly Asp Asn Asn Met Lys Ser Leu Val Thr Asp He 
225 230 235 240 

Leu Gly Ala Leu Gin Pro Gin Ser Asp Val Asn Lys Ala He Lys Pro 
30 245 250 255 

Cys Ser Gly Leu Thr Ala Phe Glu Ser Phe Cys Gly Leu Asn Gly Ala 

260 265 270 

Asp Asn Ser Glu Lys Lys Glu Gly Glu Asp Tyr Asp Leu Val Leu Cys 

275 280 285 

Met Leu Lys Asn Leu Glu Lys Gin He Gin Glu Cys Lys Lys Lys His 

290 295 300 

Gly Glu Thr Ser Val Glu Asn Gly Gly Lys Ser Cys Thr Pro Leu Asp 
305 310 315 320 

Asn Thr Thr Leu Glu Glu Glu Pro lie Glu Glu Glu Asn Gin Val Glu 
40 325 330 335 

Ala Pro Asn He Cys Pro Lys Gin Thr Val Glu Asp Lys Lys Lys Glu 

340 345 350 

Glu Glu Glu Glu Thr Cys Thr Pro Ala Ser Pro Val Pro Glu Lys Pro 
355 360 365 

45 Val Pro His Val Ala Arg Trp Arg Thr Phe Thr Pro Pro Glu Val Phe 

370 375 380 

Lys He Trp Arg Gly Arg Arg Asn Lys Thr Thr Cys Glu He Val Ala 
385 390 395 400 

Glu Met Leu Lys Asp Lys Asn Gly Arg Thr Thr Val Gly Glu Cys Tyr 
50 405 410 ' 415 

Arg Lys Glu Thr Tyr Ser Glu Trp Thr Cys Asp Glu Ser Lys He Lys 

420 425 430 

Met Gly Gin His Gly Ala Cys He Pro Pro Arg Arg Gin Lys Leu Cys 
435 • 440 445 

55 Leu His Tyr Leu Glu Lys He Met Thr Asn Thr Asn Glu Leu Lys Tyr 

450 455 460 

Ala Phe He Lys Cys Ala Ala Ala Glu Thr Phe Leu Leu Trp Gin Asn 
465 470 475 480 

Tyr Lys Lys Asp Lys Asn Gly Asn Ala Glu A£>p Leu A»p Glu Lys Leu 
60 485 490 495 

Lys Gly Gly He He Pro Glu Asp Phe Lys Arg Gin Met Phe Tyr Thr 

500 505 510 

Phe Ala Asp Tyr Arg Asp He Cys Leu Gly Thr Asp He Ser Ser Lys 
515 520 525 
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Lys Asp Thr Ser Lys Gly Val Gly Lys Val Lys Cys Asn He Asp Asp 

530 535 540 

Val Phe Tyr Lys He Ser Asn Ser He Arg Tyr Arg Lys Ser Trp Trp 
545 550 555 560 

5 Glu Thr Asn Gly Pro Val He Trp. Glu Gly Met Leu Cys Ala Leu Ser 

565 570 575 

Tyr Asp Thr Ser Leu Asn Asn Val Asn Pro Glu Thr His Lys Lys Leu 

580 585 590 

Thr Glu Gly Asn Asn Asn Phe Glu Lys Val He Phe Gly Ser Asp Ser 
10 595 600 605 

Ser Thr Thr Leu Ser Lys Phe Ser Glu Arg Pro Gin Phe Leu Arg Trp 

610 615 620 

Leu Thr Glu Trp Gly Glu Asn Phe Cys Lys Glu Gin Lys Lys Glu Tyr 
625 630 635 640 

Lys Val Leu Leu Ala Lys Cys Lys Asp Cys Asp Val Asp Gly Asp Gly 

645 650 655 

Lys Cys Asn Gly Lys Cys Val Ala Cys Lys Asp Gin Cys Lys Gin Tyr 

660 665 670 

His Ser Trp He Gly He Trp He Asp Asn Tyr Lys Lys Gin Lys Gly 
20 675 680 685 

Arg Tyr Thr Glu Val Lys Lys He Pro Leu Tyr Lys Glu Asp Lys Asp 

690 695 700 

Val Lys Asn Ser Asp Asp Ala Arg Asp Tyr Leu Lys Thr Gin Leu Gin 
705 710 715 720 

Asn Met Lys Cys Val Asn Gly Thr Thr Asp Glu Asn Cys Glu Tyr Lys 

725 730 735 

Cys Met His Lys Thr Ser Ser Thr Asn Ser Asp Met Pro Glu Ser Leu 

740 745 750 

Asp Glu Lys Pro Glu Lys Val Lys Asp Lys Cys Asn Cys Val Pro Asn 
30 755 760 765 

Glu Cys Asn Ala Leu Ser Val Ser Gly Ser Gly Phe Pro Asp Gly Gin 

770 775 780 

Ala Phe Gly Gly Gly Val Leu Glu Gly Thr Cys Lys Gly Leu Gly Glu 
785 790 795 800 

Pro Lys Lys Lys He Glu Pro Pro Gin Tyr Asp Pro Thr Asn Asp He 

805 810 815 

Leu Lys Ser Thr He Pro Val Thr He Val Leu Ala Leu Gly Ser He 

820 825 830 

Ala Phe Leu Phe Met Lys Val He Tyr He Tyr Val Trp Tyr He Tyr 
40 835 840 845 

Met Leu Cys Val Gly Ala Leu Asp Thr Tyr He Cys Gly Cys He Cys 

850 855 860 

He Cys He Phe He Cys Val Ser Val Tyr Val Cys Val Tyr Val Tyr 
865 870 875 880 

Val Phe Leu Tyr Met Cys Val Phe Tyr He Tyr Phe He Tyr He Tyr 

885 890 895 

Val Phe He Leu Lys Met Lys Lys Met Lys Lys Met Lys Lys Met Lys 

900 905 910 

Lys Met Lys Lys Arg Lys Lys Arg lie 
50 915 920 

(2) INFORMATION FOR SEQ ID NO: 9 : 

(i) SEQUENCE CHARACTERISTICS: 
55 (A) LENGTH: 2101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

60 (ii) MOLECULE TYPE: DNA (genomic) 

<iii) HYPOTHETICAL: NO 
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(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GGAACAGGGT GATAATAAAG TAGGAGCCTG TGCTCCGTAT AGACGATTAC ATTTATGTGA 60 
TTATAATTTG GAATCTATAG ACACAACGTC GACGACGCAT AAGTTGTTGT TAGAGGTGTG 120 
TATGGCAGCA AAATACGAAG GAAACTCAAT AAATACACAT TATACACAAC ATCAACGAAC 180 
TAATGAGGAT TCTGCTTCCC AATTATGTAC TGTATTAGCA CGAAGTTTTG CAGATATAGG 240 
TGATATCGTA AGAGGAAAAG ATCTATATCT CGGTTATGAT AATAAAGAAA AAGAACAAAG 300 
AAAAAAATTA GAACAGAAAT TGAAAGATAT TTTCAAGAAA ATACATAAGG ACGTGATGAA 360 
GACGAATGGC GCACAAGAAC GCTACATAGA TGATGCCAAA GGAGGAGATT TTTTTCAATT 420 
AAGAGAAGAT TGGTGGACGT CGAATCGAGA AACAGTATGG AAAGCATTAA TATGTCATGC 480 
ACCAAAAGAA GCTAATTATT TTATAAAAAC AGCGTGTAAT GTAGGAAAAG GAACTAATGG 540 
TCAATGCCAT TGCATTGGTG GAGATGTTCC CACATATTTC GATTATGTGC CGCAGTATCT 600 
TCGCTGGTTC GAGGAATGGG CAGAAGACTT TTGCAGGAAA AAAAAAAAAA AACTAGAAAA 660 
TTTGCAAAAA CAGTGTCGTG ATTACGAACA AAATTTATAT TGTAGTGGTA ATGGCTACGA 72 0 
TTGCACAAAA ACTATATATA AAAAAGGTAA ACTTGTTATA GGTGAACATT GTACAAACTG 780 
TTCTGTTTGG TGTCGTATGT ATGAAACTTG GATAGATAAC CAGAAAAAAG AATTTCTAAA 840 
ACAAAAAAGA AAATACGAAA CAGAAATATC AGGTGGTGGT AGTGGTAAGA GTCCTAAAAG 900 
GACAAAACGG GCTGCACGTA GTAGTAGTAG TAGTGATGAT AATGGGTATG AAAGTAAATT 960 
TTATAAAAAA CTGAAAGAAG TTGGCTACCA AGATGTCGAT AAATTTTTAA AAATATTAAA 1020 
CAAAGAAGGA ATATGTCAAA AACAACCTCA AGTAGGAAAT GAAAAAGCAG ATAATGTTGA 1080 
TTTTACTAAT GAAAAATATG TAAAAACATT TTCTCGTACA GAAATTTGTG AACCGTGCCC 1140 
ATGGTGTGGA TTGGAAAAAG GTGGTCCACC ATGGAAAGTT AAAGGTGACA AAACCTGCGG 1200 
AAGTGCAAAA ACAAAGACAT ACGATCCTAA AAATATTACC GATATACCAG TACTCTACCC 1260 
TGATAAATCA CAGCAAAATA TACTAAAAAA ATATAAAAAT TTTTGTGAAA AAGGTGCACC 1320 
TGGTGGTGGT CAAATTAAAA AATGGCAATG TTATTATGAT GAACATAGGC CTAGTAGTAA 1380 
AAATAATAAT AATTGTGTAG AAGGAACATG GGACAAGTTT ACACAAGGTA AACAAACCGT 1440 
TAAGTCCTAT AATGTTTTTT TTTGGGATTG GGTTCATGAT ATGTTACACG ATTCTGTAGA 1500 
GTGGAAGACA GAACTTAGTA AGTGTATAAA TAATAACACT AATGGCAACA CATGTAGAAA 1560 
CAATAATAAA TGTAAAACAG ATTGTGGTTG TTTTCAAAAA TGGGTTGAAA AAAAACAACA 1620 
AGAATGGATG GCAATAAAAG ACCATTTTGG AAAGCAAACA GATATTGTCC AACAAAAAGG 1680 
TCTTATCGTA TTTAGTCCCT ATGGAGTTCT TGACCTTGTT TTGAAGGGCG GTAATCTGTT 1740 
GCAAAATATT AAAGATGTTC ATGGAGATAC AGATGACATA AAACACATTA AGAAACTGTT 1800 
GGATGAGGAA GACGCAGTAG CAGTTGTTCT TGGTGGCAAG GACAATACCA CAATTGATAA 1860 
ATTACTACAA CACGAAAAAG AACAAGCAGA ACAATGCAAA CAAAAGCAGG AAGAATGCGA 1920 
GAAAAAAGCA CAACAAGAAA GTCGTGGTCG CTCCGCCGAA ACCCGCGAAG ACGAAAGGAC 1980 
ACAACAACCT GCTGATAGTG CCGGCGAAGT CGAAGAAGAA GAAGACGACG ACGACTACGA 2040 
CGAAGACGAC GAAGATGACG ACGTAGTCCA GGACGTAGAT GTAAGTGAAA TAAGAGGTCC 2100 
G 2101 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 700 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Glu Gin Gly Ai>p As ii Lys Veil Gly A let Cys Ala Pro Tyr Arg Arg lieu 
15 10 15 

His Leu Cys Asp Tyr Asn Leu Glu Ser lie Asp Thr Thr Ser Thr Thr 

20 25 30 

His Lys Leu Leu Leu Glu Val Cys Met Ala Ala Lys Tyr Glu Gly Asn 
35 40 45 
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Ser lie Asn Thr His Tyr Thr Gin His Gin Arg Thr Asn Glu Asp Ser 

50 55 60 

Ala Ser Gin Leu Cys Thr Val Leu Ala Arg Ser Phe Ala Asp lie Gly 
65 70 75 80 

5 Asp lie Val Arg Gly Lys Asp Leu Tyr Leu Gly Tyr Asp Asn Lys Glu 

85 90 95 

Lys Glu Gin Arg Lys Lys Leu Glu Gin Lys Leu Lys Asp lie Phe Lys 

100 105 110 

Lys lie His Lys Asp Val Met Lys Thr Asn Gly Ala Gin Glu Arg Tyr 
10 115 120 125 

lie Asp Asp Ala Lys Gly Gly Asp Phe Phe Gin Leu Arg Glu Asp Trp 

130 135 140 

Trp Thr Ser Asn Arg Glu Thr Val Trp Lys Ala Leu lie Cys His Ala 
145 150 155 160 

15 Pro Lys Glu Ala Asn Tyr Phe lie Lys Thr Ala Cys Asn Val Gly Lys 

165 170 175 

Gly Thr Asn Gly Gin Cys His Cys lie Gly Gly Asp Val Pro Thr Tyr 

180 185 . 190 

Phe Asp Tyr Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu 
20 195 200 205 

Asp Phe Cys Arg Lys Lys Lys Lys Lys Leu Glu Asn Leu Gin Lys Gin 

210 215 220 

Cys Arg Asp Tyr Glu Gin Asn Leu Tyr Cys Ser Gly Asn Gly Tyr Asp 
225 230 235 240 

25 Cys Thr Lys Thr lie Tyr Lys Lys Gly Lys Leu Val lie Gly Glu His 

245 250 255 

Cys Thr Asn Cys Ser Val Trp Cys Arg Met Tyr Glu Thr Trp lie Asp 

260 265 270 

Asn Gin Lys Lys Glu Phe Leu Lys Gin Lys Arg Lys Tyr Glu Thr Glu 
30 275 280 285 

lie Ser Gly Gly Gly Ser Gly Lys Ser Pro Lys Arg Thr Lys Arg Ala 

290 295 300 

Ala Arg Ser Ser Ser Ser Ser Asp Asp Asn Gly Tyr Glu Ser Lys Phe 
305 310 315 320 

35 Tyr Lys Lys Leu Lys Glu Val Gly Tyr Gin Asp Val Asp Lys Phe Leu 

325 330 335 

Lys lie Leu Asn Lys Glu Gly lie Cys Gin Lys Gin Pro Gin Val Gly 

340 345 350 

Asn Glu Lys Ala Asp Asn Val Asp Phe Thr Asn Glu Lys Tyr Val Lys 
40 3 5 5 3 6 0 3 6 5 

Thr Phe Ser Arg Thr Glu He Cys Glu Pro Cys Pro Trp Cys Gly Leu 

370 375 380 

Glu Lys Gly Gly Pro Pro Trp Lys Val Lys Gly Asp Lys Thr Cys Gly 
385 390 395 400 

45 Ser Ala Lys Thr Lys Thr Tyr Asp Pro Lys Asn He Thr Asp He Pro 

405 410 415 

Val Leu Tyr Pro Asp Lys Ser Gin Gin Asn lie Leu Lys Lys Tyr Lys 

420 425 430 

Asn Phe Cys Glu Lys Gly Ala Pro Gly Gly Gly Gin He Lys Lys Trp 
50 4 3 5 4 4 0 ' 445 

Gin Cys Tyr Tyr Asp Glu His Arg Pro Ser Ser Lys Asn Asn Asn Asn 

450 455 460 

Cys Val Glu Gly Thr Trp Asp Lys Phe Thr Gin Gly Lys Gin Thr Val 
465 470 475 480 

55 Lys Ser Tyr Asn Val Phe Phe Trp Asp Trp Val His Asp Met Leu His 

485 490 495 

Asp Ser Val Glu Trp Lys Thr Glu Leu Ser Lys Cys He Asn Asn Asn 

500 505 510 

Tnr Asn Gly Asn Thr Cys Arg Asn Asn Asn Lys Cys Lys Thr Asp Cys 
60 515 520 525 

Gly Cys Phe Gin Lys Trp Val Glu Lys Lys Gin Gin Glu Trp Met Ala 

530 535 540 

He Lys Asp His Phe Gly Lys Gin Thr Asp lie Val Gin Gin Lys Gly 
545 550 555 560 
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Leu lie Val Phe Ser Pro Tyr Gly Val Leu Asp Leu Val Leu Lys Gly 

565 570 575 

Gly Asn Leu Leu Gin Asn lie Lys Asp Val His Gly Asp Thr Asp Asp 
580 585 590 

5 lie Lys His lie Lys Lys Leu Leu Asp Glu Glu Asp Ala Val Ala Val 

595 600 605 

Val Leu Gly Gly Lys Asp Asn Thr Thr lie Asp Lys Leu Leu Gin His 

610 615 620 

Glu Lys Glu Gin Ala Glu Gin Cys Lys Gin Lys Gin Glu Glu Cys Glu 
1° 625 630 635 640 

Lys Lys Ala Gin Gin Glu Ser Arg Gly Arg Ser Ala Glu Thr Arg Glu 

645 650 655 

Asp Glu Arg Thr Gin Gin Pro Ala Asp Ser Ala Gly Glu Val Glu Glu 

660 665 670 

Glu Glu Asp Asp Asp Asp Tyr Asp Glu Asp Asp Glu Asp Asp Asp Val 

675 680 685 

Val Gin Asp Val Asp Val Ser Glu lie Arg Gly Pro 
690 695 700 

20 (2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8220 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

AAAAATGGGG CCCAAGGAGG CTGCAGGTGG GGATGATATT GAGGATGAAA GTGCCAAACA 60 
TATGTTTGAT AGGATAGGAA AAGATGTGTA CGATAAAGTA AAAGAGGAAG CTAAAGAACG 120 
TGGTAAAGGC TTGCAAGGAC GTTTGTCAGA AGCAAAATTT GAGAAAAATG AAAGCGATCC 180 

40 ACAAACACCA GAAGATCCAT GCGATCTTGA T CAT AAAT AT CATACAAATG TAACTACTAA 240 
TGTAATTAAT CCGTGCGCTG ATAGATCTGA CGTGCGTTTT TCCGATGAAT ATGGAGGTCA 300 
ATGTACACAT AATAGAATAA AAGATAGTCA ACAGGGTGAT AATAAAGGTG CATGTGCTCC 360 
ATATAGGCGA TTGCATGTAT GCGATCAAAA TTTAGAACAG ATAGAGCCTA TAAAAATAAC 420 
AAATACTCAT AATTTATTGG TAGATGTGTG TATGGCAGCA AAATTTGAAG GACAATCAAT 480 

45 AACACAAGAT TATCCAAAAT ATCAAGCAAC ATATGGTGAT TCTCCTTCTC AAATATGTAC 54 0 
TATGCTGGCA CGAAGTTTTG CGGACATAGG GGACATTGTC AGAGGAAGAG ATTTGTATTT 600 
AGGTAATCCA CAAGAAATAA AACAAAGACA ACAATTAGAA AATAATTTGA AAACAATTTT 660 
CGGGAAAATA TATGAAAAAT TGAATGGCGC AGAAGCACGC TACGGAAATG ATCCGGAATT 720 
TTTTAAATTA CGAGAAGATT GGTGGACTGC TAATCGAGAA ACAGTATGGA AAGCCATCAC 780 

50 ATGTAACGCT TGGGGTAATA CATATTTTCA TGCAACGTGC AATAGAGGAG AACGAACTAA 840 
AGGTTACTGC CGGTGTAACG ACGACCAAGT TCCCACATAT TTTGATTATG TGCCGCAGTA 900 
TCTTCGCTGG TTCGAGGAAT GGGCAGAAGA TTTTTGTAGG AAAAAAAATA AAAAAATAAA 960 
AGATGTTAAA AGAAATTGTC GTGGAAAAGA TAAAGAGGAT AAGGATCGAT ATTGTAGCCG 1020 
TAATGGCTAC GATTGCGAAA AAACTAAACG AGCGATTGGT AAGTTGCGTT ATGGTAAGCA 1080 

55 ATGCATTAGC TGTTTGTATG CATGTAATCC TTACGTTGAT TGGATAAATA ACCAAAAAGA 1140 
ACAATTTGAC AAACAGAAAA AAAAATATGA TGAAGAAATA AAAAAATATG AAAATGGAGC 1200 
ATCAGGTGGT AGTAGGCAAA AACGGGATGC AGGTGGTACA ACTACTACTA ATTATGATGG 1260 
ATATGAAAAA AAATTTTATG ACGAACTTAA TAAAAGTGAA TATAGAACCG TTGATAAATT 1320 
i x j.o\jw-iAmAA TTAAGTAATG AnGAAATATG CACAAAAGTT AAAGAGGAAG AAGGAGGAAC 1330 

60 AATTGATTTT AAAAACGTTA ATAGTGATAG TACTAGTGGT GCTAGTGGCA CTAATGTTGA 1440 
AAGTCAAGGA ACATTTTATC GTTCAAAATA TTGCCAACCC TGCCCTTATT GTGGAGTGAA 1500 
AAAGGTAAAT AATGGTGGTA GTAGTAATGA ATGGGAAGAG AAAAATAATG GCAAGTGCAA 1560 
GAGTGGAAAA CTTTATGAGC CTAAACCCGA CAAAGAAGGT ACTACTATTA CAATCCTTAA 1620 
AAGTGGTAAA GGACATGATG ATATTGAAGA AAAATTAAAC AAATTTTGTG ATGAAAAAAA 1680 
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TGGTGATACA ATAAATAGTG GTGGTAGTGG 
TAGACAGGAA TTGTATGAAG AATGGAAATG 
ACACGATGAG GATGACGAGG AGGATTATGA 
ATTAAAAAAC CAAAAAAAGA ATAAAGAAGA 
5 TGAAATCCAA AAGACATTCA ATCCTTTTTT 
TTCCATACAT TGGAAAAAAA AACTTCAGAG 
TGGAAACAAT AAATGTAATA ATGATTGTGA 
AGACGAATGG GGGAAAATAG TACAACATTT 
TAGTGACAAT ACGGCAGAAT TAATCCCATT 

10 GCAAGAAGAA TTTTTGAAAG GCGATTCCGA 
TCTGGATGCA GAGGAGGCAG AGGAACTAAA 
CAATAATCAA GAAGCATCTG TTGGTGGTGG 
ATTGCTCAAC TACGAAAAAG ACGAAGCCGA 
AGAGGAAAAA GAAAAAGGAG ACGGAAACGA 

15 TAATCCATGT AGTGGCGAAA GTGGTAACAA 
GTATCAAATG CATCACAAGG CAAAGACACA 
GAGAGGTGAT ATATCCTTAG CGCAATTTAA 
ACAAATTTGC AAAATTAACG AAAACTATTC 
ATGTACAGGC AAAGATGGAG ATCACGGAGG 

20 AAATATTGAA GGAAAAAAAC AAACGTCATA 
ACACATGTGT ACATCCAATT TAGAAAATTT 
GGCTAGCCAC TCATTATTGG GAGATGTTCA 
AATAAAACGC TATAAAGATC AAAATAATAT 
CCAGGAGGCT ATGTGTCGAG CTGTACGTTA 

25 AGGAAGAGAT ATGTGGGATG AGGATAAGAG 
CGTATTTAAA AACATTAAAG AAAAACATGA 
TGATGAAAGC AAAAAGCCCG CATATAAAAA 
ACATCAAGTG TGGAGAGCCA TGAAATGCGC 
AGTTGACGAT TATATCCCCC AACGTTTACG 

30 TAAAGCGCAA TCACAGGAGT ATGACAAGTT 
GGGTGATGGA AAATGTACGC AAGGTGATGT 
TAAATATAAA GAGGAAATAG AAAAATGGAA 
CAATCTATTA TACCTACAAG CAAAAACTAC 
TGATGACGAT CCCGACTATC AACAAATGGT 

35 TATTGCCGCA CGTGTTCTTG TTAAACGTGC 
CGCCCCGATC ACCCCCTACA GTACTGCTGC 
GGGGTGCCAG GAACAAACAC AATTTTGTGA 
CACGAAAGAA AACAAAGAAT ACACCTTTAA 
TGATTGCATA AATAGGTCGG AAACAGAGGA 

40 . TGCCTGCAAA ATAGTGGAGA AAATACTTGA 
ATGTAATCCA AAAGAGAGTT ATCCTGATTG 
TGATGGTGCT TGTATGCCTC CAAGGAGACA 
GAGTCAAACA GAAAATATAA AAACAGACGA 
AGCAGCAGAA ACTTTTCTTT CATGGCAATA 

45 AATATTAGAT AGAGGCCTTA TTCCATCCCA 
AGATTATAGA GATATATGTT TGAACACAGA 
GGCAAAAGAT AAAATAGGTA AATTTTTCTC 
ATCACGCCAA GAATGGTGGA AAACAAATGG 
CTTAACAAAA TACGTCACAG ATACCGATAA 

50 CGATAAAGTC AACCAATCCC AAAATGGCAA 
TCAATTTCTA CGTTGGATGA TCGAATGGGG 
GGAAAATATC ATAAAAGATG CATGTAATGA 
GAAACATCGT TGTAATCAAG CATGTAGAGC 
AGAATTTTCG GGACAAACAA ATAACTTTGT 

55 AGAATATAAA GGATATGAAT ATAAAGACGG 
ACTGCAAAAA TGTGATAATA ATAAATGTTC 
TCCAAAAGAA AAACCTTTTG GAAAATATGC 
TCAAGGAAAA CATGTACCTA GCATACCACC 
a^uaCuaaua tixAACAGTAG Au^T"ri'iiCAu 

60 CAATTTTTCC GACGCTTGTG GTCTAAAATA 
TATACCAAGT GACACAAAAA GTGGTGCTGG 
TGGTAGTATT TGTATCCCAC CCAGGAGGCG 
GGCTACCGCG CTCCCACAAG GTGAGGGCGC 
GCGCAATGCG TTCATCCAAT CTGCTGCAAT 



TACGGGTGGT AGTGGTGGTG GTAACAGTGG 1740 
TTATAAAGGT GAAGATGTAG TGAAAGTTGG 1800 
AAATGTAAAA AATGCAGGCG GATTATGTAT 1860 
AGGTGGAAAT ACGTCTGAAA AGGAGCCTGA 1920 
TTACTATTGG GTTGCACATA TGTTAAAAGA 1980 
ATGTTTACAA AATGGTAACA GAATAAAATG 2040 
ATGTTTTAAA AGATGGATTA CACAAAAAAA 2100 
TAAAACGCAA AATATTAAAG GTAGAGGAGG 2160 
TGATCACGAT TATGTTCTTC AATACAATTT 2220 
AGACGCTTCC GAAGAAAAAT CCGAAAATAG 2280 
ACACCTTCGC GAAATCATTG AAAGTGAAGA 2340 
CGTCACTGAA CAAAAAAATA TAATGGATAA 2400 
TTTATGCCTA GAAATTCACG AAGATGAGGA 2460 
ATGTATCGAA GAGGGCGAAA ATTTTCGTTA 2520 
ACGATACCCC GTTCTTGCGA ACAAAGTAGC 2580 
ATTGGCTAGT CGTGCTGGTA GAAGTGCGTT 2640 
AAATGGTCGT AACGGAAGTA CATTGAAAGG 2700 
CAATGATAGT CGTGGTAATA GTGGTGGACC 2760 
TGTGCGCATG AGAATAGGAA CGGAATGGTC 2820 
CAAAAACGTC TTTTTACCTC CCCGACGAGA 2880 
AGATGTTGGT AGTGTCACTA AAAATGATAA 2940 
GCTCGCAGCA AAAACTGATG CAGCTGAGAT 3000 
ACAACTAACT GATCCAATAC AACAAAAAGA 3060 
TAGTTTTGCC GATTTAGGAG ACATTATTCG 3120 
CTCAACAGAC ATGGAAACAC GTTTGATAAC 3180 
TGGAATCAAA GACAACCCTA AATATACCGG 3240 
ATTACGAGCA GATTGGTGGG AAGCAAATAG 3300 
AACAAAAGGC ATCATATGTC CTGGTATGCC 3360 
CTGGATGACT GAATGGGCTG AATGGTATTG 3420 
AAAAAAAATC TGTGCAGATT GTATGAGTAA 3480 
CGATTGTGGA AAGTGCAAAG CAGCATGTGA 3540 
TGAACAATGG AGAAAAATAT CAGATAAATA 3600 
TTCTACTAAT CCTGGCCGTA CTGTTCTTGG 3660 
AGATTTTTTG ACCCCAATAC ACAAAGCAAG 3720 
TGCTGGTAGT CCCACTGAGA TCGCCGCCGC 3780 
CGGATATATA CACCAGGAAA TAGGATATGG 3840 
AAAAAAACAT GGTGCAACAT CAACTAGTAC 3900 
ACAACCTCCG CCGGAGTATG CTACAGCGTG 3960 
GCCGAAGAAA AAGGAAGAAA ATGTAGAGAG 4020 
GGGTAAGAAT GGAAGGACTA CAGTAGGTGA 4080 
GGATTGCAAA AACAATATTG ACATTAGTCA 4140 
AAAACTATGT TTATATTATA TAGCACATGA 4200 
TAATTTGAAA GATGCTTTTA TTAAAACTGC 4260 
TTATAAGAGT AAGAATGATA GTGAAGCTAA 4320 
ATTTTTAAGA TCCATGATGT ACACGTTTGG 4380 
TATATCTAAA AAACAAAATG ATGTAGCTAA 4440 
AAAAGATGGC AGCAAATCTC CTAGTGGCTT 4500 
TCCAGAGATT TGGAAAGGAA TGTTATGTGC 4560 
CAAAAGAAAA ATCAAAAACG ACTACTCATA 4620 
CCCTTCCCTT GAAGAGTTTG CTGCTAAACC 4680 
AGAAGAGTTT TGTGCTGAAC GTCAGAAGAA 4740 
AATAAATTCT ACACAACAGT GTAATGATGC 4800 
ATATCAAGAA TATGTTGAAA ATAAAAAAAA 4860 
TCTAAAGGCA AATGTTCAGC CCCAAGATCC 4920 
CGTACAACCG ATACAGGGGA ATGAGTATTT 4980 
TTGCATGGAT GGAAATGTAC TTTCCGTCTC 5040 
CCATAAATAT CCTGAGAAAT GTGATTGTTA 5100 
TCCCCCCCCA CCTGTACAAC CACAACCGGA 5160 
caTAgxaaaa ALALT±A'rrTA aagacacaaa 5220 
CGGCAAAACC GCACCATCCA GTTGGAAATG 5280 
TGCCACCACC GGCAAAAGTG GTAGTGATAG 5340 
ACGATTATAT GTGGGGAAAC TACAGGAGTG 5400 
CGCGCCGTCC CACTCACGCG CCGACGACTT 5460 
AGAGACTTTT TTCTTATGGG ATAGATATAA 5520 
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AGAAGAGAAA AAACCACAGG GTGATGGGTC ACAACAAGCA CTATCACAAC TAACCAGTAC 5580 
ATACAGTGAT GACGAGGAGG ACCCCCCCGA CAAACTGTTA CAAAATGGTA AGATACCCCC 5640 
CGATTTTTTG AGATTAATGT TCTATACATT- AGGAGATTAT AGGGATATTT TAGTACACGG 5700 
TGGTAACACA AGTGACAGTG GTAACACAAA TGGTAGTAAC AACAACAATA TTGTGCTTGA 5760 
5 AGCGAGTGGT AACAAGGAGG ACATGCAAAA AATACAAGAG AAAATAGAAC AAATTCTCCC 5820 
AAAAAATGGT GGCACACCTC TTGTCCCAAA ATCTAGTGCC CAAACACCTG ATAAATGGTG 5880 
GAATGAACAC GCCGAATCTA TCTGGAAAGG TATGATATGT GCATTGACAT ATACAGAAAA 5940 
GAACCCTGAC ACCAGTGCAA GAGGCGACGA AAACAAAATA GAAAAGGATG ATGAAGTGTA 6000 
CGAGAAATTT TTTGGCAGCA CAGCCGACAA ACATGGCACA GCCTCAACCC CAACCGGCAC 6060 

10 ATACAAAACC CAATACGACT ACGAAAAAGT CAAACTTGAG GATACAAGTG GTGCCAAAAC 6120 
CCCCTCAGCC TCTAGTGATA CACCCCTTCT CTCCGATTTC GTGTTACGCC CCCCCTACTT 6180 
CCGTTACCTT GAAGAATGGG GTCAAAATTT TTGTAAAAAA AGAAAGCATA AATTGGCACA 6240 
AATAAAACAT GAGTGTAAAG TAGAAGAAAA TGGTGGTGGT AGTCGTCGTG GTGGTATAAC 6300 
AAGACAATAT AGTGGGGATG GCGAAGCGTG TAATGAGATG CTTCCAAAAA ACGATGGAAC 6360 

15 TGTTCCGGAT TTAGAAAAGC CGAGTTGTGC CAAACCTTGT AGTTCTTATA GAAAATGGAT 6420 
AGAAAGCAAG GGAAAAGAGT TTGAGAAACA AGAAAAGGCA TATGAACAAC AAAAAGACAA 6480 
ATGTGTAAAT GG AAGTAA TA AGCATGATAA TGGATTTTGT GAAACACTAA CAACGTCCTC 6540 
TAAAGCTAAA GACTTTTTAA AAACCTTAGG ACCATGTAAA CCTAATAATG TAGAGGGTAA 6600 
AACAATTTTT GATGATGATA AAACCTTTAA ACATACAAAA GATTGTGATC CATGTCTTAA 6660 

20 ATTTAGTGTT AATTGTAAAA AAGATGAATG TGATAATTCT AAAGGAACCG ATTGCCGAAA 6720 
TAAAAATAGT ATTGATGCAA CAGATATTGA AAATGGAGTG GATTCTACTG TACTAGAAAT 6780 
GCGTGTCAGT GC TGATA GTA AAAGTGGATT TAATGGTGAT GGTTTAGAGA ATGCTTGTAG 6840 
AGGTGCTGGT ATCTTTGAAG GTATTAGAAA AGATGAATGG AAATGTCGTA ATGTATGTGG 6900 
TTATGTTGTA TGTAAACCGG AAAACGTTAA TGGGGAAGCA AAGGGAAAAC ACATTATACA 6960 

25 AATTAGAGCA CTGGTTAAAC GTTGGGTAGA ATATTTTTTT GAAGATTATA ATAAAATAAA 7020 
ACATAAAATT TCACATCGCA TAAAAAATGG TGAAATATCT CCATGTATAA AAAATTGTGT 7080 
AGAAAAATGG GTAGATCAGA AAAGAAAAGA ATGGAAGGAA ATTACTGAAC GTTTCAAAGA 7140 
TCAATATAAA AATGACAATT CAGATGATGA CAATGTGAGA AGTTTTTTGG AGACCTTGAT 7200 
ACCTCAAATT ACTGATGCAA ACGCTAAAAA TAAGGTTATA AAATTAAGTA AGTTCGGTAA 7260 

3D TTCTTGTGGA TGTAGTGCCA GTGCGAACGA ACAAAACAAA AATGGTGAAT ACAAGGACGC 7320 
TATAGATTGT ATGCTTAAAA AGCTTAAAGA TAAAATTGGC GAGTGCGAAA AGAAACACCA 7380 
TCAAACTAGT GATACCGAGT GTTCCGACAC ACCACAACCG CAAACCCTTG AAGACGAAAC 7440 
TTTGGATGAT GATATAGAAA CAGAGGAGGC GAAGAAGAAC ATGATGCCGA AAATTTGTGA 7500 
AAATGTGTTA AAAACAGCAC AACAAGAGGA TGAAGGCGGT TGTGTCCCAG CAGAAAATAG 7560 

35 TGAAGAACCG GCAGCAACAG ATAGTGGTAA GGAAACCCCC GAACAAACCC CCGTTCTCAA 7620 
ACCCGAAGAA GAAGCAGTAC CGGAACCACC ACCTCCACCC CCACAGGAAA AAGCCCCGGC 7680 
ACCAATACCC CAACCACAAC CACCAACCCC CCCCACACAA CTCTTGGATA ATCCCCACGT 7740 
TCTAAC CGCC CTGGTGACCT CCACCCTCGC CTGGAGCGTT GGCATCGGTT TTGCTACATT 7800 
CACTTATTTT TATCTAAAGG TAAATGGAAG TATATATATG GGGATGTGGA TGTATGTGGA 7860 

40 TGTATGTGAA TGTATGTGGA TGTATGTGGA TGTATGTGGA TGTGTTTTAT GGATATGTAT 7920 
TTGTGATTAT GTTTGGATAT ATATATATAT ATATATATGT TTATGTATAT GTGTTTTTGG 7980 
ATATATATAT GTGTATGTAT ATGATTTTCT GTATATGTAT TTGTGGGTTA AGGATATATA 8040 
TATATGGATG TACTTGTATG TGTTTTATAT ATATATTTTA TATATATGTA TTTATATTAA 8100 
AAAAGAAATA TAAAAACAAA TTTATTAAAA TGAAAAAAAG AAAAATGAAA TATAAAAAAA 8160 

45 AATTTATTAA AATAAAAAAA AAAAAAAAAA AAAAGGAGAA AAATTTTTTA AAAAATAATA 8220 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2710 amino acids 
50 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



55 



(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 

Asn Val Met Val Glu Leu Ala Lys Met Gly Pro Lys Glu Ala Ala Gly 
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15 10 is 

Gly Asp Asp lie Glu Asp Glu Ser Ala Lys His Met Phe Asp Arg He 

20 25 30 

Gly Lys Asp Val Tyr Asp Lys Val Lys Glu Glu Ala Lys Glu Arg Gly 

35 40 45 

Lys Gly Leu Gin Gly Arg Leu Ser Glu Ala Lys Phe Glu Lys Asn Glu 

50 55 60 

Ser Asp Pro Gin Thr Pro Glu Asp Pro Cys Asp Leu Asp His Lys Tyr 
€5 70 75 80 

His Thr Asn Val Thr Thr Asn Val He Asn Pro Cys Ala Asp Arg Ser 

85 90 95 

Asp Val Arg Phe Ser Asp Glu Tyr Gly Gly Gin Cys Thr His Asn Arg 

100 105 no 

He Lys Asp Ser Gin Gin Gly Asp Asn Lys Gly Ala Cys Ala Pro Tyr 
15 115 120 125 

Arg Arg Leu His Val Cys Asp Gin Asn Leu Glu Gin He Glu Pro He 

130 135 140 

Lys He Thr Asn Thr His Asn Leu Leu Val Asp Val Cys Met Ala Ala 
145 150 155 160 

20 Lys Phe Glu Gly Gin Ser He Thr Gin Asp Tyr Pro Lys Tyr Gin Ala 

165 170 175 

Thr Tyr Gly Asp Ser Pro Ser Gin He Cys Thr Met Leu Ala Arg Ser 

180 185 190 

Phe Ala Asp He Gly Asp He Val Arg Gly Arg Asp Leu Tyr Leu Gly 
25 195 200 205 

Asn Pro Gin Glu He Lys Gin Arg Gin Gin Leu Glu Asn Asn Leu Lys 

210 215 220 

Thr He Phe Gly Lys He Tyr Glu Lys Leu Asn Gly Ala Glu Ala Arg 
225 230 235 240 

30 Tyr Gly Asn Asp Pro Glu Phe Phe Lys Leu Arg Glu Asp Trp Trp Thr 

245 250 255 

Ala Asn Arg Glu Thr Val Trp Lys Ala He Thr Cys Asn Ala Trp Gly 

260 265 270 

Asn Thr Tyr Phe His Ala Thr Cys Asn Arg Gly Glu Arg Thr Lys Gly 
35 2 7 5 2 8 0 2 8 5 

Tyr Cys Arg Cys Asn Asp Asp Gin Val Pro Thr Tyr Phe Asp Tyr Val 

290 295 300 

Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu Asp Phe Cys Arg 
305 310 315 320 

40 Lys Lys Asn Lys Lys He Lys Asp Val Lys Arg Asn Cys Arg Gly Lys 

325 330 335 

Asp Lys Glu Asp Lys Asp Arg Tyr Cys Ser Arg Asn Gly Tyr Asp Cys 

340 345 350 

Glu Lys Thr Lys Arg Ala He Gly Lys Leu Arg Tyr Gly Lys Gin Cys 
45 355 360 365 

He Ser Cys Leu Tyr Ala Cys Asn Pro Tyr Val Asp Trp He Asn Asn 

370 375 380 

Gin Lys Glu Gin Phe Asp Lys Gin Lys Lys Lys Tyr Asp Glu Glu He 
385 390 395 400 

50 Lys Lys Tyr Glu Asn Gly Ala Ser Gly Gly Ser Arg Gin Lys Arg Asp 

405 410 415 

Ala Gly Gly Thr Thr Thr Thr Asn Tyr Asp Gly Tyr Glu Lys Lys Phe 

420 425 430 

Tyr Asp Glu Leu Asn Lys Ser Glu Tyr Arg Thr Val Asp Lys Phe Leu 
55 4 3 5 4 4 0 4 4 5 

Glu Lys Leu Ser Asn Glu Glu lie Cys Thr Lys Val Lys Asp Glu Glu 

450 455 460 

Gly Gly Thr He Asp Phe Lys Asn Val Asn Ser Asp Ser Thr Ser Gly 

4bb 4/U 475 480 

60 Ala Ser Gly Thr Asn Val Glu Ser Gin Gly Thr Phe Tyr Arg Ser Lys 

485 490 495 

Tyr Cys Gin Pro Cys Pro Tyr Cys Gly Val Lys Lys Val Asn Asn Gly 

500 505 510 

Gly Ser Ser Asn Glu Trp Glu Glu Lys Asn Asn Gly Lys Cys Lys Ser 
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30 
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515 520 525 

Gly Lys lieu Tyr Glu Pro Lys Pro Asp Lys Glu Gly Thr Thr lie Thr 

530 535 540 

He Leu Lys Ser Gly Lys Gly His Asp Asp He Glu Glu Lys Leu Asn 
545 550 555 560 

Lys Phe Cys Asp Glu Lys Asn Gly Asp Thr He Asn Ser Gly Gly Ser 

565 570 575 

Gly Thr Gly Gly Ser Gly Gly Gly Asn Ser Gly Arg Gin Glu Leu Tyr 

580 585 590 

Glu Glu Trp Lys Cys Tyr Lys Gly Glu Asp Val Val Lys Val Gly His 

595 600 605 

Asp Glu Asp Asp Glu Glu Asp Tyr Glu Asn Val Lys Asn Ala Gly Gly 

610 615 620 

Leu Cys He Leu Lys Asn Gin Lys Lys Asn Lys Glu Glu Gly Gly Asn 
15 "5 630 635 640 

Thr Ser Glu Lys Glu Pro Asp Glu He Gin Lys Thr Phe Asn Pro Phe 

645 650 655 

Phe Tyr Tyr Trp Val Ala His Met Leu Lys Asp Ser He His Trp Lys 

660 665 670 

Lys Lys Leu Gin Arg Cys Leu Gin Asn Gly Asn Arg He Lys Cys Gly 

675 680 685 

Asn Asn Lys Cys Asn Asn Asp Cys Glu Cys Phe Lys Arg Trp He Thr 

690 695 700 

Gin Lys Lys Asp Glu Trp Gly Lys He Val Gin His Phe Lys Thr Gin 
25 7 0 5 7 1 0 7 1 5 7 2 0 

Asn He Lys Gly Arg Gly Gly Ser Asp Asn Thr Ala Glu Leu He Pro 

725 730 735 

Phe Asp His Asp Tyr Val Leu Gin Tyr Asn Leu Gin Glu Glu Phe Leu" 

740 745 750 

Lys Gly Asp Ser Glu Asp Ala Ser Glu Glu Lys Ser Glu Asn Ser Leu 

755 760 765 

Asp Ala Glu Glu Ala Glu Glu Leu Lys His Leu Arg Glu He He Glu 

770 775 780 

Ser Glu Asp Asn Asn Gin Glu Ala Ser Val Gly Gly Gly Val Thr Glu 
35 785 790 795 800 

Gin Lys Asn He Met Asp Lys Leu Leu Asn Tyr Glu Lys Asp Glu Ala 

805 810 815 

Asp Leu Cys Leu Glu He His Glu Asp Glu Glu Glu Glu Lys Glu Lys 

820 825 830 

Gly Asp Gly Asn Glu Cys He Glu Glu Gly Glu Asn Phe Arg Tyr Asn 

835 840 845 

Pro Cys Ser Gly Glu Ser Gly Asn Lys Arg Tyr Pro Val Leu Ala Asn 

850 855 860 

Lys Val Ala Tyr Gin Met His His Lys Ala Lys Thr Gin Leu Ala Ser 
45 865 870 875 880 

Arg Ala Gly Arg Ser Ala Leu Arg Gly Asp He Ser Leu Ala Gin Phe 

885 890 895 

Lys Asn Gly Arg Asn Gly Ser Thr Leu Lys Gly Gin He Cys Lys He 

900 905 910 

Asn Glu Asn Tyr Ser Asn Asp Ser Arg Gly Asn Ser Gly Gly Pro Cys 

515 920 925 

Thr Gly Lys Asp Gly Asp His Gly Gly Val Arg Met Arg He Gly Thr 

930 935 940 

Glu Trp Ser Asn He Glu Gly Lys Lys Gin Thr Ser Tyr Lys Asn Val 
55 945 950 955 960 

Phe Leu Pro Pro Arg Arg Glu His Met Cys Thr Ser Asn Leu Glu Asn 

965 970 975 

Leu Asp Val Gly Ser Val Thr Lys Asn Asp Lys Ala Ser His Ser Leu 

QOn ftor r\i\r\ 

if is \j 

60 

Leu Gly Asp Val Gin Leu Ala Ala Lys Thr Asp Ala Ala Glu He He 

995 1000 1005 

Lys Arg Tyr Lys Asp Gin Asn Asn He Gin Leu Thr Asp Pro He Gin 
1010 1015 1020 
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Gin Lys Asp Gin Glu Ala Met Cys Arg Ala Val Arg Tyr Ser Phe Ala 
1025 1030 1035 1040 

Asp Leu Gly Asp lie lie Arg Gly Arg Asp Met Trp Asp Glu Asp Lys 
1045 1050 1055 

5 Ser Ser Thr Asp Met Glu Thr Arg Leu lie Thr Val Phe Lys Asn lie 

1060 1065 1070 

Lys Glu Lys His Asp Gly lie Lys Asp Asn Pro Lys Tyr Thr Gly Asp 

1075 1080 1085 

Glu Ser Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp Trp Trp Glu 
10 1090 1095 1100 

Ala Asn Arg His Gin Val Trp Arg Ala Met Lys Cys Ala Thr Lys Gly 
1105 1110 1115 1120 

lie lie Cys Pro Gly Met Pro Val Asp Asp Tyr lie Pro Gin Arg Leu 
1125 1130 1135 

15 Arg Trp Met Thr Glu Trp Ala Glu Trp Tyr Cys Lys Ala Gin Ser Gin 

1140 1145 1150 

Glu Tyr Asp Lys Leu Lys Lys lie Cys Ala Asp Cys Met Ser Lys Gly 

1155 1160 1165 

Asp Gly Lys Cys Thr Gin Gly Asp Val Asp Cys Gly Lys Cys Lys Ala 
20 1170 1175 1180 

Ala Cys Asp Lys Tyr Lys Glu Glu lie Glu Lys Trp Asn Glu Gin Trp 
1185 1190 1195 1200 

Arg Lys lie Ser Asp Lys Tyr Asn Leu Leu Tyr Leu Gin Ala Lys Thr 

1205 1210 1215 

Thr Ser Thr Asn Pro Gly Arg Thr Val Leu Gly Asp Asp Asp Pro Asp 

1220 1225 1230 

Tyr Gin Gin Met Val Asp Phe Leu Thr Pro lie His Lys Ala Ser lie 

1235 1240 1245 

Ala Ala Arg Val Leu Val Lys Arg Ala Ala Gly Ser Pro Thr Glu lie 
30 1250 1255 1260 

Ala Ala Ala Ala Pro lie Thr Pro Tyr Ser Thr Ala Ala Gly Tyr lie 
1265 1270 1275 1280 

His Gin Glu He Gly Tyr Gly Gly Cys Gin Glu Gin Thr Gin Phe Cys 

1285 1290 1295 

Glu Lys Lys His Gly Ala Thr Ser Thr Ser Thr Thr Lys Glu Asn Lys 

1300 1305 1310 

Glu Tyr Thr Phe Lys Gin Pro Pro Pro Glu Tyr Ala Thr Ala Cys Asp 

1315 1320 1325 

Cys lie Asn Arg Ser Gin Thr Glu Glu Pro Lys Lys Lys Glu Glu Asn 
40 1330 1335 1340 

Val Glu Ser Ala Cys Lys He Val Glu Lys He Leu Glu Gly Lys Asn 
1345 1350 1355 1360 

Gly Arg Thr Thr Val Gly Glu Cys Asn Pro Lys Glu Ser Tyr Pro Asp 

1365 1370 1375 

Trp Asp Cys Lys Asn Asn He Asp He Ser His Asp Gly Ala Cys Met 

1380 1385 1390 

Pro Pro Arg Arg Gin Lys Leu Cys Leu Tyr Tyr He Ala His Glu Ser 

1395 1400 1405 

Gin Thr Glu Asn He Lys Thr Asp Asp Asn Leu Lys Asp Ala Phe He 
50 1410 1415 • 1420 

Lys Thr Ala Ala Ala Glu Thr Phe Leu Ser Trp Gin Tyr Tyr Lys Ser 
1425 1430 1435 1440 

Lys Asn Asp Ser Glu Ala Lys He Leu Asp Arg Gly Leu He Pro Ser 
1445 1450 1455 

55 Gin Phe Leu Arg Ser Met Met Tyr Thr Phe Gly Asp Tyr Arg Asp He 

1460 1465 1470 

Cys Leu Asn Thr Asp He Ser Lys Lys Gin Asn Asp Val Ala Lys Ala 

1475 1480 1485 

jjys Asp jjys lie uiy iiys fhe Fhe ser jjys Asp Giy ser Lys ser Pro 
60 14 9 0 14 95 1 5 0 0 

Ser Gly Leu Ser Arg Gin Glu Trp Trp Lys Thr Asn Gly Pro Glu He 
1505 1510 1515 1520 

Trp Lys Gly Met Leu Cys Ala Leu Thr Lys Tyr Val Thr Asp Thr Asp 
1525 1530 1535 
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Asn Lys Arg Lys lie Lys Asn Asp Tyr Ser Tyr Asp Lys Val Asn Gin 

1540 1545 1550 

Ser Gin Asn Gly Asn Pro Ser Leu Glu Glu Phe Ala Ala Lys Pro Gin 
1555 1560 1565 

5 Phe Leu Arg Trp Met lie Glu Trp Gly Glu Glu Phe Cys Ala Glu Arg 

1570 1575 1580 

Gin Lys Lys Glu Asn lie lie Lys Asp Ala Cys Asn Glu lie Asn Ser 
1585 1590 1595 1600 

Thr Gin Gin Cys Asn Asp Ala Lys His Arg Cys Asn Gin Ala Cys Arg 
10 1605 1610 1615 

Ala Tyr Gin Glu Tyr Val Glu Asn Lys Lys Lys Glu Phe Ser Gly Gin 

1620 1625 1630 

Thr Asn Asn Phe Val Leu Lys Ala Asn Val Gin Pro Gin Asp Pro Glu 

1635 1640 1645 

Tyr Lys Gly Tyr Glu Tyr Lys Asp Gly Val Gin Pro lie Gin Gly Asn 

1650 1655 1660 

Glu Tyr Leu Leu Gin Lys Cys Asp Asn Asn Lys Cys Ser Cys Met Asp 
1665 1670 1675 1680* 

Gly Asn Val Leu Ser Val Ser Pro Lys Glu Lys Pro Phe Gly Lys Tyr 
20 1685 1690 1695 

Ala His Lys Tyr Pro Glu Lys Cys Asp Cys Tyr Gin Gly Lys His Val 

1700 1705 1710 

Pro Ser lie Pro Pro Pro Pro Pro Pro Val Gin Pro Gin Pro Glu Ala 
1715 1720 1725 

25 Pro Thr Val Thr Val Asp Val Cys Ser lie Val Lys Thr Leu Phe Lys 

1730 1735 1740 

Asp Thr Asn Asn Phe Ser Asp Ala Cys Gly Leu Lys Tyr Gly Lys Thr 
1745 1750 1755 1760 

Ala Pro Ser Ser Trp Lys Cys lie Pro Ser Asp Thr Lys Ser Gly Ala 
30 1765 1770 1775 

Gly Ala Thr Thr Gly Lys Ser Gly Ser Asp Ser Gly Ser lie Cys lie 

1780 1785 1790 

Pro Pro Arg Arg Arg Arg Leu Tyr Val Gly Lys Leu Gin Glu Trp Ala 
1795 1800 1805 

35 Thr Ala Leu Pro Gin Gly Glu Gly Ala Ala Pro Ser His Ser Arg Ala 

1810 1815 1820 

Asp Asp Leu Arg Asn Ala Phe lie Gin Ser Ala Ala lie Glu Thr Phe 
1825 1830 1835 1840 

Phe Leu Trp Asp Arg Tyr Lys Glu Glu Lys Lys Pro Gin Gly Asp Gly 
40 1845 1850 1855 

Ser Gin Gin Ala Leu Ser Gin Leu Thr Ser Thr Tyr Ser Asp Asp Glu 

1860 1865 1870 

Glu Asp Pro Pro Asp Lys Leu Leu Gin Asn Gly Lys lie Pro Pro Asp 
1875 1880 1885 

45 Phe Leu Arg Leu Met Phe Tyr Thr Leu Gly Asp Tyr Arg Asp lie Leu 

1890 1895 1900 

Val His Gly Gly Asn Thr Ser Asp Ser Gly Asn Thr Asn Gly Ser Asn 
1905 1910 1915 1920 

Asn Asn Asn lie Val Leu Glu Ala Ser Gly Asn Lys Glu Asp Met Gin 
50 1925 1930 1935 

Lys lie Gin Glu Lys lie Glu Gin lie Leu Pro Lys Asn Gly Gly Thr 

1940 1945 1950 

Pro Leu Val Pro Lys Ser Ser Ala Gin Thr Pro Asp Lys Trp Trp Asn 
1955 1960 1965 

55 Glu His Ala Glu Ser lie Trp Lys Gly Met He Cys Ala Leu Thr Tyr 

1970 1975 1980 

Thr Glu Lys Asn Pro Asp Thr Ser Ala Arg Gly Asp Glu Asn Lys He 
1985 1990 1995 2000 

r»T , T r*~* TV TV ^.-^x 0 1 ■»•» 17-» T ISf-v Ol u T •« TYU.-. *VU^ nl.. ml » 1 — -fc 

w ^j^j »u*r£r m * j «j w *. i*v W*jr .». 111- U. XIO^ 

BO 2005 2010 2015 

Lys His Gly Thr Ala Ser Thr Pro Thr Gly Thr Tyr Lys Thr Gin Tyr 

2020 2025 2030 

Asp Tyr Glu Lys Val Lys Leu Glu Asp Thr Ser Gly Ala Lys Thr Pro 
2035 2040 2045 
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Ser Ala Ser Ser Asp Thr Pro Leu Leu Ser Asp Phe Val Leu Arg Pro 

2050 2055 2060 

Pro Tyr Phe Arg Tyr Leu Glu Glu Trp Gly Gin Asn Phe Cys Lys Lys 
2065 2070 2075 2080 

5 Arg Lys His Lys Leu Ala Gin He Lys His Glu Cys Lys Val Glu Glu 

2085 2090 2095 

Asn Gly Gly Gly Ser Arg Arg Gly Gly He Thr Arg Gin Tyr Ser Gly 

2100 2105 2110 

Asp Gly Glu Ala Cys Asn Glu Met Leu Pro Lys Asn Asp Gly Thr Val 
10 2115 2120 2125 

Pro Asp Leu Glu Lys Pro Ser Cys Ala Lys Pro Cys Ser Ser Tyr Arg 

2130 2135 2140 

Lys Trp He Glu Ser Lys Gly Lys Glu Phe Glu Lys Gin Glu Lys Ala 
2145 2150 2155 2160 

Tyr Glu Gin Gin Lys Asp Lys Cys Val Asn Gly Ser Asn Lys His Asp 

2165 2170 2175 

Asn Gly Phe Cys Glu Thr Leu Thr Thr Ser Ser Lys Ala Lys Asp Phe 

2180 2185 2190 

Leu Lys Thr Leu Gly Pro Cys Lys Pro Asn Asn Val Glu Gly Lys Thr 
20 2195 2200 2205 

He Phe Asp Asp Asp Lys Thr Phe Lys His Thr Lys Asp Cys Asp Pro 

2210 2215 2220 

Cys Leu Lys Phe Ser Val Asn Cys Lys Lys Asp Glu Cys Asp Asn Ser 
2225 2230 2235 2240 

Lys Gly Thr Asp Cys Arg Asn Lys Asn Ser lie Asp Ala Thr Asp lie 

2245 2250 2255 

Glu Asn Gly Val Asp Ser Thr Val Leu Glu Met Arg Val Ser Ala Asp 

226.0 2265 2270 

Ser Lys Ser Gly Phe Asn Gly Asp Gly Leu Glu Asn Ala Cys Arg Gly 
30 2 2 7 5 2 2 8 0 2 2 8 5 

Ala Gly He Phe Glu Gly He Arg Lys Asp Glu Trp Lys Cys Arg Asn 

2290 2295 2300 

Val Cys Gly Tyr Val Val Cys Lys Pro Glu Asn Val Asn Gly Glu Ala 
2305 2310 2315 2320 

Lys Gly Lys His He He Gin He Arg Ala Leu Val Lys Arg Trp Val 

2325 2330 2335 

Glu Tyr Phe Phe Glu Asp Tyr Asn Lys He Lys His Lys He Ser His 

2340 2345 2350 

Arg lie Lys Asn Gly Glu He Ser Pro Cys He Lys Asn Cys Val Glu 
40 2355 2360 2365 

Lys Trp Val Asp Gin Lys Arg Lys Glu Trp Lys Glu He Thr Glu Arg 

2370 2375 2380 

Phe Lys Asp Gin Tyr Lys Asn Asp Asn Ser Asp Asp Asp Asn Val Arg 
2385 2390 2395 2400 

45 Ser Phe Leu Glu Thr Leu He Pro Gin He Thr Asp Ala Asn Ala Lys 

2405 2410 2415 

Asn Lys Val He Lys Leu Ser Lys Phe Gly Asn Ser Cys Gly Cys Ser 

2420 2425 2430 

Ala Ser Ala Asn Glu Gin Asn Lys Asn Gly Glu Tyr Lys Asp Ala He 
50 2435 2440 ' 2445 

Asp Cys Met Leu Lys Lys Leu Lys Asp Lys He Gly Glu Cys Glu Lys 

2450 2455 2460 

Lys His His Gin Thr Ser Asp Thr Glu Cys Ser Asp Thr Pro Gin Pro 
2465 2470 2475 2480 

55 Gin Thr Leu Glu Asp Glu Thr Leu Asp Asp Asp He Glu Thr Glu Glu 

2485 2490 2495 

Ala Lys Lys Asn Met Met Pro Lys He Cys Glu Asn Val Leu Lys Thr 

2500 _ __ 2505 2510 

Ala Gin Glu v^iu Aap oiu uly tily uys vai fro Axa Glu Asn Ser Glu 
60 2515 2520 2525 

Glu Pro Ala Ala Thr Asp Ser Gly Lys Glu Thr Pro Glu Gin Thr Pro 

2530 2535 2540 

Val Leu Lys Pro Glu Glu Glu Ala Val Pro Glu Pro Pro Pro Pro Pro 
2545 2550 2555 2560 
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Pro Gin Glu Lys Ala Pro Ala Pro lie Pro Gin Pro Gin Pro Pro Thr 

2565 2570 2575 

Pro Pro Thr Gin Leu Leu Asp Asn Pro His Val Leu Thr Ala Leu Val 

2580 2585 2590 

Thr Ser Thr Leu Ala Trp Ser Val Gly lie Gly Phe Ala Thr Phe Thr 

2595 2600 2605 

Tyr Phe Tyr Leu Lys Val Asn Gly Ser He Tyr Met Gly Met Trp Met 

2610 2615 2620 

Tyr Val Asp Val Cys Glu Cys Met Trp Met Tyr. Val Asp Val Cys Gly 
2625 2630 2635 2640 

Cys Val Leu Trp He Cys He Cys Asp Tyr Val Trp He Tyr He Tyr 

2645 2650 2655 

He Tyr He Cys Leu Cys He Cys Val Phe Gly Tyr He Tyr Val Tyr 

2660 2665 2670 

Val Tyr Asp Phe Leu Tyr Met Tyr Leu Trp Val Lys Asp He Tyr He 

2675 2680 2685 

Trp Met Tyr Leu Tyr Val Phe Tyr He Tyr He Leu Tyr He Cys He 

2690 2695 2700 

Tyr He Lys Lys Glu He 
2705 2710 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19124 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

ACATTTTTTC GTAATATATA TATATATATA TATATATAAT TCTCTTTTTC TAATATATAT 
60 

ATCCTTCTAT TTTCGATTTT TTCATTTTTT TCCAGTATTA ATTTATTTAT TTATTTGTGA 120 
TATTTTATAA TATATTATTT AAATGTGTAT TTATATATGT GTTTTATTTT TGTTATTAAT 180 
TTGAATAATC CGAGCGAAAA AAAATATATA ATCTCATATA AAAATTATTT ATAATACAAT 240 
ATTATATAGT TTCCTATTAA AATAAATTAA TATAATATAC AATAATATTT CTTGTTATTT 300 
TT ATA AATAT AACTAATTTC TTATTTTTAT TTAACTTTAT TCCTTTTTAA TTTCTTAATT 360 
CTTTTATGCA AACAAAAAAC ATAAAGTAAT TCTACATATC AACAAAAAAA AAAAAAAAAA 420 
AAAAAAAAAA ATTTATT ATA ATATAATAAA AAATATAAAG ACATACGTTC ACTTATTATT 480 
ATAAATGATT TATTACGATT AAAACATATT GAGATTATAA TAATATAATT TAACATAGAA 540 
AGAGTTAAGA ATACATTTTT TTTTTTTTTT TGATATGTAA TTCAACATAT ATATATATAT 600 
ATATCTTTTT AATTTAATTA AATAAAATTC CTTATTATTC ATATTGTTTC TTTTATCACA 660 
TGTGAAATAT TAAAAATAAT TTTCGATTTT ATCGATATAT TTATGTCGTT TATATACTTA 720 
TATAGGTCTT TATAACTATT GATTAATAGA AGGTAATAGC CTAATAATAT AAATACTCGT 780 
ATTTATAAAT TCATTTATAT ATTTCAAATA TATTTCGATG GTTTATTTTC AAATACAATT 840 
AATTAGATTT CTTAAATATT TCTTCATTTA TTCATTTTTA TAG CAT AT AC ATGCACATTA 900 
TAAATTATTA ATAAAAAATT TTTATTTTAA TATATAATAA CAATTTTCAT ACATTACATT 960 
TTTCACACAA CATTTAAGTT GTCATAATGT AACACATTAA ATAATATATT ACTTATATAT 1020 
ATATAATTAT TAATTATATA TTAAATAAAA ATGTATTATC GCCTGTATTA TCATAGTATA 1080 
TATAATGTTG TATAACGCTT CAAAATATAT ATAATAATAT AATTAAAAAT ATATATATAG 1140 
TAATTAATTA TTTTGTTATG TTATGTAATA ATGCAATTAA TATAAGATAA AATTCTATAG 1200 
CTATTATTTA AAATATATAT ATATATATAT ATATATATAT ATATTAGTAT ATGTTATCAA 1260 
AATATTATAA TATGTAAATT ATTAATAAAA TATATTTGTA TAACATACAA GACTAAAGAA 1320 
AAv_r±ATACAA TCTGGTATCT AATAGTATAT ATATATAATA TCT"rrrrrAx xtaattgttc 1380 
TCTCTTTTTT TTTTTTTTAA ATAATAATAA ATATTAATAT ATTTTTTTTC ATAATTATAT 1440 
GATTTAGTAT TTTAATAATA AATAAATCTT TTAAAAAACT TCAAAACATT TTTGCATAAA 1500 
ATAATATTAA TATTAGTAAC CACCTAGATA AATTAGAGAG AAACGTAGAA CATACCAAAA 1560 
AAAATTAGAA CAAAAAGAAT ATTACAAAAA ATAATAAAAT TAAATTATTT CTTTACTATT 1620 
AATTTAAAGT TTTTTTTCAT ATCATATATT ATGATACACA ATGTTTGTTG TTAAATGTTT 1680 
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TATATACATG CAATGATATG TTTCTGTTGG 
AAATGTATTG TACACCTTTA GCAACTATTA 
GAAAATATGT TATATTATTA CAATATCTTA 
AAAATTACAA TTGTAATTAA TCGTATGACA 
5 CAAAATTATA AAAAATATGG AAATGTTTTG 
TTATTTTATT ATTTATTTTT TTTTTTTTTT 
AAGTAAAAAA TATATATATT TACATAATGG 
ATAATAATAT TTTAGATTAA ACATATGTAA 
ATATATATAT TAATTATTAA GTTATAGATT 

10 AAAATGAAAG TTCACTACAG TAATATATTA 
ATATCACGTA TGCACTAAAT AATGACAATA 
GTAAATAAAA AAATATACAT ATATACAAAA 
GATAAATATC CAGAAGAACT ATTACATCAC 
ACAACCACTA GGTTATTATG CGAATGTGAC 

15 GAAATGATAT TAGTGATGGA AAATTTCAAT 
AATGAACGCA TGCAAGAAAA ACGAAAAATA 
AAAATTATTT TAAAAGATAA AATCGAAAAG 
ACGAATATAA AGACTGAGGA TATACCTACT 
GTGGAAAAAA CGTGTTTGAA ATGTGGAGGT 

20 GGTTTATTAG GAGAAATAGG TGGACTTGTT 
AAAGCTTTTC TTACTTTTGC TCAAAAGGAA 
ACTGCTCGTA TTGATACAGT TATTTAAGGA 
AATGGTTCTA CGTTGGGGAA AGTTATTACC 
ACTACGGCAC TATATAATGA ATATGTAAGC 

25 AAATTAATTT GTGCTTTTGG GATGAGAGAC 
CGAGACGTTA TAGGATCAAG TGTAAAAGGA 
CAAGCTGCTG AGACAGCTGC TAACGAAACT 
AAAATAACAT CTGCAGGTGC TAATTTACAC 
TTGGTTATAG TTTTGGTTAT GGTAATTATT 

30 AAAATGAAGA AAAAATTGCA ATATATAAAA 
CTATTAGCGG TAATTTAAAG TATTGTGAAT 
AATTAATTTT TTTTTATAAT ATTATATTTT 
TATTATATGA TTATTTAATT ATTATACTTA 
ATATATGTAT CTATCTATCT ATCTATCTAT 

35 TTATTATTAT TAGATG CAT A TTAGTGATGA 
ACATAATAAT ATATTAAATT AATAGAACTT 
AGAAATTTGA AAAAGTAATT TACACATGAT 
TATT TATT TA TAAAAATTGT TTAATATAAG 
TTAGCTTTCC ATTATACAAA TATATATTTC 

40 TA AAAAA AGT ATAATATAAT AAAATATCTA 
AAATTTTAAT TTTATACGAT AGAATAAATT 
AAGAACCTAT TACAATATAG TAACAACTGG 
TGTAAAAGGA TAGTTGTTAA AGGCTTTTTT 
TATAATAGAT ATCTTAACAT ACAACTTTGC 

45 AGAAATATTA TAAATAATAT TATAAAAAAT 
TATTAATTTA ATTTTATTTT ATTGTTCTAA 
TCTAATATAA TTAAGATATT TCTAATATTA 
AGAATAATTT TTTACTTATT TATTATAATA 
GATGACAAAA AAAAAACTTT TAAAATGGAA 

50 TAATTGGTGA AATAGTTGTA ACTTATACAA 
TAATATTGTT TATGTATCGT AATATATATT 
TTCTAATAAT ATATTCATAT GTAGTCATAG 
TATTATTGTA TATATTAAAT AAGTAACACA 
ATAATATATT TTTATGTTAT ATATTATTAG 

55 ATGAAAATTT TTGTATATGA TATAGTTATA 
ATGGAAAGCA TAAAAAATGT TACTGTAATA 
TTATCTTAAA AAGGTTCCTA TTATAACATT 
TAACTACATT TACATAATGA AATTTCGATT 

AlTA-nTATA 'WUTiAA'l-UUi ridATATAA 

60 ATAAGAAATA AATATCCTGA TTTTGTAGTT 
TATATATTAT ATATATCTTT ACAACAAGTA 
GAAAATAAAA ATAATAAAAT AAGAATACTG 
AATGTAACAT AATTACAAAT ACGTAACATG 
AGGATAAATA TAAATATTTA AAATTATATT 
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AATATGTATT ATATACTTAT ATGTTCTAAT 1740 
CTACACACAT TTTTATATAA TTTATAACAG 1800 
ATGTGTTTTT GCAAAAATAT AAAAAACAAG 1860 
TAAAATTATA TTATATTAGA AATTAAAATT 1920 
TTATATTATT TTTTTAAAAA TTTAATTATT 1980 
GTGTTCTAAA TAAAAAGGCA AATATGATTC 2040 
CAAAATAATT GTTTATTATA TTATATGACT 2100 
TTCATTTAAC AGAATAAAAT AAAATATTAT 2160 
TAATAAAAAT ATATTATACA TATGAGATTA 2220 
TTATATGTCG TCAATTTAAG TATATTCTTA 2280 
ATAATATATA TGTAACATTT TATAATTGAT 2340 
ACATATATGA TATTTACATT CTTTTTTATA 2400 
TTCACTTCAT ATACCAAACA CGAAAAAAAT 2460 
TTATATACGT CCATTTATGA TAATGACCCG 2520 
AAACAGACAG AAGAAAGGTT TCATGAATAC 2580 
TGTAAAGAAC AATGCGAAAA GGATATACAA 2640 
GAATTAACAG AAAAGTTAGA GGCATTGGAA 2700 
TGTGTATGCG AAAAATCAGT AGCAGATAAA 2760 
ATATTGGGTG TTGGTGTGAC TCCATCTTTA 2820 
ATAAATAATT GGACAAATAC TCCTTTTTAT 2880 
GGTATAGCTG CCGGTAAAAT TGCTAGTGAT 2940 
ATAATATCAA ATTTTGATGT GCACACTATA 3000 
GTAGAAGCTC TTAAGGATGA CACTACTCTT 3060 
ATGTGTGTAA ATACGAACCC TGTCGAAGAC 3120 
GGTCTAGTTG CAGGGCAATA TGCTTCATCG 3180 
ATTATTAGAA AAGCTGCAAA CGCTGCTTCA 3240 
ACTTCCGGAA TGATCGAAGC CGAGTTAAGT 3300 
AGTGCAATTA CTTACTCAGT AACTGCGATA 3360 
TATTTAATAT TACGTTATCG TAGAAAAAAA 3420 
TTATTAAAGG AATAGATATA CGATGTCGAG 3480 
TTTTCATTTA ATATGCTATG ATCATTTGAT 3540 
TTTATACCTT GGATTCTTAC ATTGTTTTAT 3600 
TATATATATA TATTTTTACA TTAAGATATT 3660 
ATATATATAT ATATATATAT ATTATAATAA 3720 
TTATAATAAT AACCTATTGA AGAGAATAGA 3780 
CATTTTTATT GTTATATGTA TATAAAAATA 3840 
AATGTATTTT ATTTTATTTG TGTTGTTTTA 3900 
TTGTTATTAT AATTTTTTAA TATGGCACCA 3960 
CTCATTAGAA TCTGAATATT TATTGTATTA 4020 
AGATTTTTTC TAATTTGTTT AATTTATAAT 4080 
ATAATCAACA TATATATATG TATTCATCTT 4140 
TTCCTTTTTA TTATAAATAA CATAAGAATG 4200 
AATATTGATT ATAAATGTTT GTAAGATATA 4260 
ATAATTGTAA TTAAAAAAAT ATATATAATA 4320 
TAAGCATAAA TGTCACAATA AATTTTTTTT 4380 
AATATATTGA TTATGAGAAT ATTATTTGTG 4440 
ATTTATATAT ATATATTTAA AAGTATTTTA 4500 
TGAAATATGC ATGGAGTATA TATAAATATT 4560 
AATATGCATA TAATAAAATA CTATATAGTA 4620 
ACATGTTGCA TTCATAATTT AGAGATTATG 4680 
AATATAATTG TTTTTTTAGT ATGTATGGTA 4740 
TGTCAATGAA TATAAAATAT GGTATATTTA 4800 
GAACATTATA TATAGTAATA AATAGAAGAA 4860 
TTATTATAAA GGGGAAAATT CATAATATTT 4920 
AGTTAAAAAA AAAAAAAAAC AAGAACAAAA 4980 
GGATAAAATA TATTATATAA AATGTTTATT 5040 
AAAAAAAATT TGTCCCATTT TATAAATAAT 5100 
TTGTGTTTTT TTGATGAATA TTATGGACTA 5160 
TAATAATAAT TUXATTTAAA AAAATGAAAA 522 U 
CCAATAGCTT AATATAATTA TGGACTCATA 5280 
ATAAGTAAAT ATTATTTTAA TCTTAATAAG 5340 
AATAATAAGT CATATTATAC ATTTTTTAAA 5400 
TATTATAGAA ATAATAAGAA TTTAATATTA 5460 
TTTTTATGTC AATTTATGTT ATATTATATT 5520 
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ATATTAACAT GATTAGTTTT TTGAAAAATA TTTAAATATC ATATAATAAT AATAAATTAG 5580 
TTAAAATAAT AGTATTTCAT ACAAAATACT AACTTATAAG TATATCATAT AATATTATAT 5640 
ATATATATAT TTATGTGTTT TTGATTGGGT GTATATAAGG CTATAAGTAT ATATGGGTTG 5700 
TTCATTATAT ATTTATATGT GAATAGATAC ATATAAGTTA ATATATTTAT TTGTGTATAT 5760 
5 GTCTGTGTTA AGATAGATAT GCATTACAGT TAAGGGTTAT AGTTTTTTTT T riTTTTTTT 5820 
GTACATATAT ATAAAAAATA GATAACTAAC AATATGCATA TTACAAGAAT AATATTTGTA 5880 
TAAAATATAT ATAT ATATA T ATATATAAAG ACATTAAAAC TATACTAATA GGTAATTAGT 5940 
TTTATTATAT CATCCTTTTA TTATTATAAT TTTTTTTGTT TTACTTCTTG TCGTTCTTTT 6000 
TTGTTATTAT AATATAACAA ATATAAAACA ATATCAGTAT TTGGAATATA AATAAATTTA 6060 

10 TTCTACATAT ATGC ATATAT ATATATATAT ATATATATAT ATATATATAT ATATATATAT 6120 
ATATGTATGA TTTTATACTA TTTTTATACA TGCATTTTTA TATATTTTAG TATATACTTT 6180 
AAAGATATTA TTAATATTTA TATAGTAGCA TATATGTATT TATATTATAA CAAATATTTT 6240 
CATTTATATA AATATATAGA ACATGAACAT TTTATTAATA ACTCATATTT GAATATATAT 6300 
ATTTATAATG TGTATTTTTA CTTATTTTTT TATATTATAC AATAAAATTT TGAAATTCAT 6360 

15 AAAATGCATG AAATACATAA AAAAATACAA CAAAACAAAT GATAAAAACA TTTTTATTAA 6420 
TATAATATAA TATAATATAA TAATATATTT TTCCTGTTAT TTATTTATCA T TTlTlTriT 6480 
GATGCTATAT ATATTATTAT ATAATAAATT ATAATATATA ACAACAAAAA TTAATAATAA 6540 
TAATATACTA CTTTTAATAT AATACAACAA TACAAAGAAT ATGTATCTAT ATCAATTATA 6600 
TATATATGAA TATATAAATA TGATAGATAA TATAGATAGA GAGAAACGAA GAACATATTT 6660 

20 GTCTCTTTTG TTATCTCTAA TATATATATA TATATAATAA ATTAAAATAA AGTCAAAAAA 6720 
AATATACATA TATTAATGTT AATAATTAAA TATATAAACA CGTTGCATAT ATACTTTTTT 6780 
ATATGTTTGT ATTTTCGTAT TTTTTTTTTC TCATTTATAA TTTTACTTAA TAAATAAAAC 6840 
ATAAAAAAAA TAATATATAT ATAATTAAAT AGATAAATAA AGGAATACAT AAAATATAAT 6900 
ATTTCTGATT ATATTTTTTT TTTGTTAGAA TATTTAAATT TATTATAAAT TTATTAATAT 6960 

25 ATAT ATATAT TTTTTTTAAA AATATATAAA ACTAATAATT ATTATTATAT ACATATTAAA 7020 
TATTATTTTT TTAACATATA CATATATTGT AATATTATAA TAGTACAACT ATTAATATAT 7080 
ATATATATAT ATATACAATA TTTATATATA TTGTAATACA TAAATTATAC CTTACATATA 7140 
TATATACATT CACAAAAGTG TTATTATTCT TATTCTACCA TATTATAATA CTACTGTAAT 7200 
ATACATATAT ACATACCCCC ACGTACGTAC GAAACACCAC CAAACCATGT ATCACGTATG 7260 

30 TATGTATGCC ACGATATAAA CCACGTACCA CGTATGACAT AATGTAATGG TGGAGTTAGC 7320 
AAAAATGGGG CCCAAGGAGG CTGCAGGTGG GGATGATATT GAGGATGAAA GTGCCAAACA 7380 
TATGTTTGAT AGGATAGGAA AAGATGTGTA CGATAAAGTA AAAGAGGAAG CTAAAGAACG 7440 
TGGTAAAGGC TTGCAAGGAC GTTTGTCAGA AGCAAAATTT GAGAAAAATG AAAGCGATCC 7500 
ACAAACACCA GAAGATCCAT GCGATCTTGA TCATAAATAT CATACAAATG TAACTACTAA 7560 

35 TGTAATTAAT CCGTGCGCTG ATAGATCTGA CGTGCGTTTT TCCGATGAAT ATGGAGGTCA 7620 
ATGTACACAT AATAGAATAA AAGATAGTCA ACAGGGTGAT AATAAAGGTG CATGTGCTCC 7680 
ATATAGGCGA TTGCATGTAT GCGATCAAAA TTTAGAACAG ATAGAGCCTA TAAAAATAAC 7740 
AAATACTCAT AATTTATTGG TAGATGTGTG TATGGCAGCA AAATTTGAAG GACAATCAAT 7800 
AACACAAGAT TATCCAAAAT ATCAAGCAAC ATATGGTGAT TCTCCTTCTC AAATATGTAC 7860 

40 TATGCTGGCA CGAAGTTTTG CGGACATAGG GGACATTGTC AGAGGAAGAG ATTTGTATTT 7920 
AGGTAATCCA CAAGAAATAA AACAAAGACA ACAATTAGAA AATAATTTGA AAACAATTTT 7980 
CGGGAAAATA TATGAAAAAT TGAATGGCGC AGAAGCACGC TACGGAAATG ATCCGGAATT 8040 
TTTTAAATTA CGAGAAGATT GGTGGACTGC TAATCGAGAA ACAGTATGGA AAGCCATCAC 8100 
ATGTAACGCT TGGGGTAATA CATATTTTCA TGCAACGTGC AATAGAGGAG AAOGAACTAA 8160 

45 AGGTTACTGC CGGTGTAACG ACGACCAAGT TCCCACATAT TTTGATTATG TGCCGCAGTA 8220 
TCTTCGCTGG TTCGAGGAAT GGGCAGAAGA TTTTTGTAGG AAAAAAAATA AAAAAATAAA 8280 
AGATGTTAAA AGAAATTGTC GTGGAAAAGA TAAAGAGGAT AAGGATCGAT ATTGTAGCCG 8340 
TAATGGCTAC GATTGCGAAA AAACTAAACG AGCGATTGGT AAGTTGCGTT ATGGTAAGCA 8400 
ATGCATTAGC TGTTTGTATG CATGTAATCC TTACGTTGAT TGGATAAATA ACCAAAAAGA 8460 

50 ACAATTTGAC AAACAGAAAA AAAAATATGA TGAAGAAATA AAAAAATATG AAAATGGAGC 8520 
ATCAGGTGGT AGTAGGCAAA AACGGGATGC AGGTGGTACA ACTACTACTA ATTATGATGG 8580 
ATATGAAAAA AAATTTTATG ACGAACTTAA TAAAAGTGAA TATAGAACCG TTGATAAATT 8640 
TTTGGAAAAA TTAAGTAATG AAGAAATATG CACAAAAGTT AAAGACGAAG AAGGAGGAAC 8700 
AATTGATTTT AAAAACGTTA ATAGTGATAG TACTAGTGGT GCTAGTGGCA CTAATGTTGA 8760 

55 AAGTCAAGGA ACATTTTATC GTTCAAAATA TTGCCAACCC TGCCCTTATT GTGGAGTGAA 8820 

AAAGGTAAAT AATGGTGGTA GTAGTAATGA ATGGGAAGAG AAAAATAATG GCAAGTGCAA 8880 

GAGTGGAAAA CTTTATGAGC CTAAACCCGA CAAAGAAGGT ACTACTATTA CAATCCTTAA 8940 

AAGTGGTAAA GGACATGATG ATATTGAAGA AAAATTAAAC AAATTTTGTG ATGAAAAAAA 9000 
r mn r rnTi r T7ir , 7i j\tt»a 7iT> Tin r nn n r mn r r a riTrin n*T^r , rir*ri'vrsnTr inr^n>r*mr*rmv^ nn»mimn»nrin r»/\/r/\ 

60 TAGACAGGAA TTGTATGAAG AATGGAAATG TTATAAAGGT GAAGATGTAG TGAAAGTTGG 9120 
ACACGATGAG GATGACGAGG AGGATTATGA AAATGTAAAA AATGCAGGCG GATTATGTAT 9180 
ATTAAAAAAC CAAAAAAAGA ATAAAGAAGA AGGTGGAAAT ACGTCTGAAA AGGAGCCTGA 9240 
TGAAATCCAA AAGACATTCA ATCCTTTTTT TTACTATTGG GTTGCACATA TGTTAAAAGA 9300 
TTCCATACAT TGGAAAAAAA AACTTCAGAG ATGTTTACAA AATGGTAACA GAATAAAATG 9360 
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TGGAAACAAT AAATGTAATA ATGATTGTGA ATGTTTTAAA AGATGGATTA CACAAAAAAA 9420 
AGACGAATGG GGGAAAATAG TACAACATTT TAAAACGCAA AATATTAAAG GTAGAGGAGG 9480 
TAGTGACAAT ACGGCAGAAT TAATCCCATT TGATCACGAT TATGTTCTTC AATACAATTT 9540 
GCAAGAAGAA TTTTTGAAAG GCGATTCCGA AGACGCTTCC GAAGAAAAAT CCGAAAATAG 9600 
5 TCTGGATGCA GAGGAGGCAG AGGAACTAAA ACACCTTCGC GAAATCATTG AAAGTGAAGA 9660 
CAATAATCAA GAAGCATCTG TTGGTGGTGG CGTCACTGAA CAAAAAAATA TAATGGATAA 9720 
ATTGCTCAAC TACGAAAAAG ACGAAGCCGA TTTATGCCTA GAAATTCACG AAGATGAGGA 9780 
AGAGGAAAAA GAAAAAGGAG ACGGAAACGA ATGTATCGAA GAGGGCGAAA ATTTTCGTTA 9840 
TAATCCATGT AGTGGCGAAA GTGGTAACAA ACGATACCCC GTTCTTGCGA ACAAAGTAGC 9900 

10 GTATCAAATG CATCACAAGG CAAAGACACA ATTGGCTAGT CGTGCTGGTA GAAGTGCGTT 9960 
GAGAGGTGAT ATATCCTTAG CGCAATTTAA AAATGGTCGT AACGGAAGTA CATTGAAAGG 10020 
ACAAATTTGC AAAATTAACG AAAACTATTC CAATGATAGT CGTGGTAATA GTGGTGGACC 10080 
ATGTACAGGC AAAGATGGAG ATCACGGAGG TGTGCGCATG AGAATAGGAA CGGAATGGTC 10140 
AAATATTGAA GGAAAAAAAC AAACGTCATA CAAAAACGTC TTTTTACCTC CCCGACGAGA 10200 

15 ACACATGTGT ACATCCAATT TAGAAAATTT AGATGTTGGT AGTGTCACTA AAAATGATAA 10260 
GGCTAGCCAC TCATTATTGG GAGATGTTCA GCTCGCAGCA AAAACTGATG CAGCTGAGAT 10320 
AATAAAACGC TATAAAGATC AAAATAATAT ACAACTAACT GATCCAATAC AACAAAAAGA 10380 
CCAGGAGGCT ATGTGTCGAG CTGTACGTTA TAGTTTTGCC GATTTAGGAG ACATTATTCG 10440 
AGGAAGAGAT ATGTGGGATG AGGATAAGAG CTCAACAGAC ATGGAAACAC GTTTGATAAC 10500 

20 CGTATTTAAA AACATTAAAG AAAAACATGA TGGAATCAAA GACAACGCTA AATATACCGG 10560 
TGATGAAAGC AAAAAGCCCG CATATAAAAA ATTACGAGCA GATTGGTGGG AAGCAAATAG 10620 
ACATCAAGTG TGGAGAGCCA TGAAATGCGC AACAAAAGGC ATCATATGTC CTGGTATGCC 10680 
AGTTGACGAT TATATCCCCC AACGTTTACG CTGGATGACT GAATGGGCTG AATGGTATTG 10740 
TAAAGCGCAA TCACAGGAGT ATGACAAGTT AAAAAAAATC TGTGCAGATT GTATGAGTAA 10800 

25 GGGTGATGGA AAATGTACGC AAGGTGATGT CGATTGTGGA AAGTGCAAAG CAGCATGTGA 10860 
TAAATATAAA GAGGAAATAG AAAAATGGAA TGAACAATGG AGAAAAATAT CAGATAAATA 10920 
CAATCTATTA TACCTACAAG CAAAAACTAC TTCTACTAAT CCTGGCCGTA CTGTTCTTGG 10980 
TGATGACGAT CCCGACTATC AACAAATGGT AGATTTTTTG ACCCCAATAC ACAAAGCAAG 11040 
TATTGCCGCA CGTGTTCTTG TTAAACGTGC TGCTGGTAGT CCCACTGAGA TCGCCGCCGC 11100 

30 CGCCCCGATC ACCCCCTACA GTACTGCTGC CGGATATATA CACCAGGAAA TAGGATATGG 11160 
GGGGTGCCAG GAACAAACAC AATTTTGTGA AAAAAAACAT GGTGCAACAT CAACTAGTAC 11220 
CACGAAAGAA AACAAAGAAT ACACCTTTAA ACAACCTCCG CCGGAGTATG CTACAGCGTG 11280 
TGATTGCATA AATAGGTCGC AAACAGAGGA GCCGAAGAAA AAGGAAGAAA ATGTAGAGAG 1134 0 
TGCCTGCAAA ATAGTGGAGA AAATACTTGA GGGTAAGAAT GGAAGGACTA CAGTAGGTGA 11400 

35 ATGTAATCCA AAAGAGAGTT ATCCTGATTG GGATTGCAAA AACAATATTG ACATTAGTCA 11460 
TGATGGTGCT TGTATGCCTC CAAGGAGACA AAAACTATGT TTATATTATA TAGCACATGA 11520 
GAGTCAAACA G AAAATATAA AAACAGACGA TAATTTGAAA GATGCTTTTA TTAAAACTGC 11580 
AGCAGCAGAA ACTTTTCTTT CATGGCAATA TTATAAGAGT AAGAATGATA GTGAAGCTAA 11640 
AATATTAGAT AGAGGCCTTA TTCCATCCCA ATTTTTAAGA TCCATGATGT ACACGTTTGG 11700 

40 AGATTATAGA GATATATGTT TGAACACAGA TATATCTAAA AAACAAAATG ATGTAGCTAA 11760 
GGCAAAAGAT AAAATAGGTA AATTTTTCTC AAAAGATGGC AGCAAATCTC CTAGTGGCTT 11820 
ATCACGCCAA GAATGGTGGA AAACAAATGG TCCAGAGATT TGGAAAGGAA TGTTATGTGC 11880 
CTTAACAAAA TACGTCACAG ATACCGATAA CAAAAGAAAA ATCAAAAACG ACTACTCATA 11940 
CGATAAAGTC AACCAATCCC AAAATGGCAA CCCTTCCCTT GAAGAGTTTG CTGCTAAACC 12000 

45 TCAATTTCTA CGTTGGATGA TCGAATGGGG AGAAGAGTTT TGTGCTGAAC GTCAGAAGAA 12060 
GGAAAATATC ATAAAAGATG CATGTAATGA AATAAATTCT ACACAACAGT GTAATGATGC 12120 
GAAACATCGT TGTAATCAAG CATGTAGAGC ATATCAAGAA TATGTTGAAA ATAAAAAAAA 12180 
AGAATTTTCG GGACAAACAA ATAACTTTGT TCTAAAGGCA AATGTTCAGC CCCAAGATCC 12240 
AGAATATAAA GGATATGAAT ATAAAGACGG CGTACAACCG ATACAGGGGA ATGAGTATTT 12300 

50 ACTGCAAAAA TGTGATAATA ATAAATGTTC TTGCATGGAT GGAAATGTAC TTTCCGTCTC 12360 
TCCAAAAGAA AAACCTTTTG GAAAATATGC CCATAAATAT CCTGAGAAAT GTGATTGTTA 12420 
TCAAGGAAAA CATGTACCTA GCATACCACC TCCCCCCCCA CCTGTACAAC CACAACCGGA 12480 
AGCACCAACA GTAACAGTAG ACGTTTGCAG CATAGTAAAA ACACTATTTA AAGACACAAA 12540 
CAATTTTTCC GACGCTTGTG GTCTAAAATA CGGCAAAACC GCACCATCCA GTTGGAAATG 12600 

55 TATACCAAGT GACACAAAAA GTGGTGCTGG TGCCACCACC GGCAAAAGTG GTAGTGATAG 12660 
TGGTAGTATT TGTATCCCAC CCAGGAGGCG ACGATTATAT GTGGGGAAAC TACAGGAGTG 12720 
GGCTACCGCG CTCCCACAAG GTGAGGGCGC CGCGCCGTCC CACTCACGCG CCGACGACTT 12780 
GCGCAATGCG TTCATCCAAT CTGCTGCAAT AGAGACTTTT TTCTTATGGG ATAGATATAA 12840 
AGAAGAGAAA AAACCACAGG GTGATGGGTC ACAACAAGCA CTATCACAAC TAACCAGTAC 12900 

CC ATACAGTGAT uACGAGGAGu ACCUCCCCGA CAAACTGTTA CAAAATGGTA AGATACCCCC 12960 
CGATTTTTTG AGATTAATGT TCTATACATT AGGAGATTAT AGGGATATTT TAGTACACGG 13020 
TGGTAACACA AGTGACAGTG GTAACACAAA TGGTAGTAAC AACAACAATA TTGTGCTTGA 13080 
AGCGAGTGGT AACAAGGAGG ACATGCAAAA AATACAAGAG AAAATAGAAC AAATTCTCCC 13140 
AAAAAATGGT GGCACACCTC TTGTCCCAAA ATCTAGTGCC CAAACACCTG ATAAATGGTG 13200 
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GAATGAACAC GCCGAATCTA TCTGGAAAGG TATGATATGT GCATTGACAT ATACAGAAAA 13260 
GAACCCTGAC ACCAGTGCAA GAGGCGACGA AAACAAAATA GAAAAGGATG ATGAAGTGTA 13320 
CGAGAAATTT TTTGGCAGCA CAGCCGACAA ACATGGCACA GCCTCAACCC CAACCGGCAC 13380 
ATACAAAACC CAATACGACT ACGAAAAAGT CAAACTTGAG GATACAAGTG GTGCCAAAAC 13440 
CCCCTCAGCC TCTAGTGATA CACCCCTTCT CTCCGATTTC GTGTTACGCC CCCCCTACTT 13500 
CCGTTACCTT GAAGAATGGG GTCAAAATTT TTGTAAAAAA AGAAAGCATA AATTGGCACA 13560 
AATAAAACAT GAGTGTAAAG TAGAAGAAAA TGGTGGTGGT AGTCGTCGTG GTGGTATAAC 13620 
AAGACAATAT AGTGGGGATG GCGAAGCGTG TAATGAGATG CTTCCAAAAA ACGATGGAAC 13680 
TGTTCCGGAT TTAGAAAAGC CGAGTTGTGC CAAACCTTGT AGTTCTTATA GAAAATGGAT 13740 
AGAAAGCAAG GGAAAAGAGT TTGAGAAACA AGAAAAGGCA TATGAACAAC AAAAAGACAA 13800 
ATGTGTAAAT GG AAGTAA TA AGCATGATAA TGGATTTTGT GAAACACTAA CAACGTCCTC 13860 
TAAAGCTAAA GACTTTTTAA AAACGTTAGG ACCATGTAAA CCTAATAATG TAGAGGGTAA 13920 
AACAATTTTT GATGATGATA AAACCTTTAA ACATACAAAA GATTGTGATC CATGTCTTAA 13980 
ATTTAGTGTT AATTGTAAAA AAGATGAATG TGATAATTCT AAAGGAACCG ATTGCCGAAA 14040 
TAAAAATAGT ATTGATGCAA CAGATATTGA AAATGGAGTG GATTCTACTG TACTAGAAAT 14100 
GCGTGTCAGT GC TGATA GTA AAAGTGGATT TAATGGTGAT GGTTTAGAGA ATGCTTGTAG 14160 
AGGTGCTGGT ATCTTTGAAG GTATTAGAAA AGATGAATGG AAATGTCGTA ATGTATGTGG 14220 
TTATGTTGTA TGTAAACCGG AAAACGTTAA TGGGGAAGCA AAGGGAAAAC ACATTATACA 14280 
AATTAGAGCA CTGGTTAAAC GTTGGGTAGA ATATTTTTTT GAAGATTATA ATAAAATAAA 14340 
ACATAAAATT TCACATCGCA TAAAAAATGG TGAAATATCT CCATGTATAA AAAATTGTGT 14400 
AGAAAAATGG GTAGATCAGA AAAGAAAAGA ATGGAAGGAA ATTACTGAAC GTTTCAAAGA 14460 
TCAATATAAA AATGACAATT CAGATGATGA CAATGTGAGA AGTTTTTTGG AGACCTTGAT 14520 
ACCTCAAATT ACTGATGCAA ACGCTAAAAA TAAGGTTATA AAATTAAGTA AGTTCGGTAA 14580 
TTCTTGTGGA TGTAGTGCCA GTGCGAACGA ACAAAACAAA AATGGTGAAT ACAAGGACGC 14640 
TATAGATTGT ATGCTTAAAA AGCTTAAAGA TAAAATTGGC GAGTGCGAAA AGAAACACCA 14700 
TCAAACTAGT GATACCGAGT GTTCCGACAC ACCACAACCG CAAACCCTTG AAGACGAAAC 14760 
TTTGGATGAT GATATAGAAA CAGAGGAGGC GAAGAAGAAC ATGATGCCGA AAATTTGTGA 14820 
AAATGTGTTA AAAACAGCAC AACAAGAGGA TGAAGGCGGT TGTGTCCCAG CAGAAAATAG 14880 
TGAAGAACCG GCAGCAACAG ATAGTGGTAA GGAAACCCCC GAACAAACCC CCGTTCTCAA 14940 
ACCCGAAGAA GAAGCAGTAC CGGAACCACC ACCTCCACCC CCACAGGAAA AAGCCCCGGC 15000 
ACCAATACCC CAACCACAAC CACCAACCCC CCCCACACAA CTCTTGGATA ATCCCCACGT 15060 
TCTAACCGCC CTGGTGACCT CCACCCTCGC CTGGAGCGTT GGCATCGGTT TTGCTACATT 15120 
CACTTATTTT TATCTAAAGG TAAATGGAAG TATATATATG GGGATGTGGA TGTATGTGGA 15180 
TGTATGTGAA TGTATGTGGA TGTATGTGGA TGTATGTGGA TGTGTTTTAT GGATATGTAT 15240 
TTGTGATTAT GTTTGGATAT ATATATATAT ATATATATGT TTATGTATAT GTGTTTTTGG 15300 
ATATATATAT GTGTATGTAT ATGATTTTCT GTATATGTAT TTGTGGGTTA AGGATATATA 15360 
TATATGGATG TACTTGTATG TGTTTTATAT ATATATTTTA TATATATGTA TTTATATTAA 15420 
AAAAGAAATA TAAAAACAAA TTTATTAAAA TGAAAAAAAG AAAAATGAAA TATAAAAAAA 15480 
AATTTATTAA AATAAAAAAA A AAAA AAAAA AAAAGGAGAA AAATTTTTTA AAAAATAATA 15540 
AAAATTATAA TAA AATAT AA ATTTTGATAG AATAAAAAAT GAAAAAGATT ATCAAAAAAA 15600 
AATTAAAAAA AAATTTTATA TAAAAAAAAA ATGATTATAA AAAAAATAAA AACAAAAGAA 15660 
GAAAAAAAAA AACATTAAAA AAAAAAAAAT ATATATCATA AAAACAAAAA AAAAAGAAAA 15720 
AAATATATTA AAATAAAAAT ATATATCATA AAATAAAAAA AAATTAAAAA AATGTTAAAA 15780 
AAAAAATATA TACATAAAAT AAAAAAAATT TATTTAAATA AAAAAAAATA ATAAATAAAA 15840 
AAATTTAATT AAATAAAAAA AAATAATAAA TAAAAAAATT TAATTAAATA AAAAAAAATT 15900 
AAAAAAATTT AATGAAATAA AAAAAAATAA AAAAATTTAA TTAAATAAAA AAAATAAAAT 15960 
AAAATTAATT ACATGCACAT ATACATACAT ATATATATAT ATATACCCAT AACTACATAC 16020 
AACATTTACA CATACATATA TATATATATA TATACCCATA ACTACATACA CATTTACACA 16080 
TACATATATA TATTATATAT ATATATATAT ATACCCATAA CTACATACAT ATATACATTA 16140 
ACAAACACAT ATATAATACC TAAATACATA TATACATACA CATATATGTT CATTTTTTTT 16200 
TTTAGAAAAA AACCAAATCA TCTGTTGGAA ATTTATTCCA AATACTGCAA ATACCCAAAA 16260 
GTGATTATGA TATACCGACA AAACTTTCAC CCAATAGATA TATACCTTAT ACTAGTGGTA 16320 
AATACAGAGG CAAACGGTAC ATTTACCTTG AAGGAGATAG TGGAACAGAT AGTGGTTACA 16380 
CCGATCATTA TAGTGATATA ACTTCCTCAG AAAGTGAATA TGAAGAGATG GATATAAATG 16440 
ATATATATGT ACCAGGTAGT CCTAAATATA AAACATTAAT TGAAGTGGTA CTTGAACCTA 16500 
GTGGTAACAA CACAACAGCT AGTGGTAACA ACACAACAGC TAGTGGTAAC AACACAACAG 16560 
CTAGTGGTAA AAACACACCT AGTGATACAC AAAATGATAT ACAAAATGAT GGTATACCTA 16620 
GTAGTAAAAT TACAGATAAT GAATGGAATC AATTGAAAGA TGAATTTATA TCACAATATC 16680 
TACAAAGTGA ACCAAATACA GAACCAAATA TGTTAGGTTA TAATGTGGAT AATAATACCC 16740 
AiuuxAvj^AC GiuAuATCAT AATGTGGAAG AAAAAC CTTT TATTATGTCC ATT'CAiTiATA lb tf UU 
GAAATTTATT TAGTGGAGAA GAATACAATT ATGATATGTT TAATAGTGGG AATAATCCAA 16860 
TAAACATTAG TGATTCAACA AATAGTATGG ATAGTCTAAC AAGTAACAAC CATAGTCCAT 16920 
ATAATGATAA AAATGATTTA TATAGTGGTA TCGACCTAAT CAACGACGCA CTAAGTGGTA 16980 
ATCATATTGA TATATATGAT GAAATGCTCA AACGAAAAGA AAATGAATTA TTTGGAACAA 17040 
AACATCATAC AAAACATACA AATACATATA ATGTCGCCAA ACCTGCACGT GACGACCCTA 17100 
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TAACCAATCA AATAAATTTG TTCCATAAAT GGTTAGATAG GCATAGAGAT ATGTGCGAAA 17160 
AGTGGAAAAA TAATCACGAA CGGTTACCCA AATTGAAAGA ATTGTGGGAA AATGAGACAC 17220 
ATAGTGGTGA CATAAATAGT GGTATACCTA GTGGTAACCA TGTGTTGAAT ACTGATGTTT 17280 
CTATTCAAAT AGATATGGAT AATCCTAAAA CAAAGAATGA AATTACGAAT ATGGATACAA 17340 
5 ACCCAGACAA ATCTA CTATG GATACTATAC TGGATGATCT GGAAAAATAT AATGAACCCT 17400 
ACTACTATGA TTTTTATGAA GATGATATCA TCTATCATGA TGTAGATGTT GAAAAATCAT 17460 
CTATGGATGA TATATATGTG GATCATAATA ATGTGACTAA TAATAATATG GATGTACCTA 17520 
CTAAAATGCA CATCGAAATG AATATTGTTA ATAATAAAAA GGAGATTTTC GAAGAGGAAT 17580 
ATC CTAT ATC AGATATATGG AATATCTAAA ATTAATATAC TTTTTTTGTG TGTGTCATAT 17640 
10 ATATTTTGTA TTATTTGTAT ATGTTTTTAT TTTATTTATT TATTTATTTA TTTATTGTTT 17700 
TTGGTATATT TGTAAAAAAT ATGTTTTTGT TTATAATCAT ATTATTATAT TTTTAATAAT 17760 
TTGCAACATG AlTlTrTTTT TTCTTTCTTA TTGTGTAATT TTTTTCATAA TATTTATATA 17820 
TATATATGTA TTTTATTTTT TAGTATA ATA ATTGTATCTA TATTTGATTA ATAATTATGT 17880 
ATATTATGGT TATTTTGTTT CTTTTTCTGT ACATTTTTTC GTAATATATA TATATATATA 17940 

15 TATATATAAT TCTCTTTTTC TAATATATAT ATCCTTCTAT TTTCGATTTT TTCATTTTTT 18000 
TCCAGTATTA ATTTA TTTAT TTATTTGTGA TATTTTATAA TATATTATTT AAATGTGTAT 18060 
TTATATATGT GTTTTATATA TGTGTTTTAT TTTTGTTACT CTAATTCTGA ATAATCCGAG 18120 
CGAAAAAAAA ATATATAATC TCATATAAAA ATTATTTATA ATACAATATT ATATAGTTTC 18180 
CTATTAAAAT AAATTA ATAT AATATACAAT AATATTTCTT GTTATTTTTA TAAATATAAC 18240 

20 TAATTTCTTA TTTTTATTTA ACTTTATTCC TTTTTAATTT CTTAATTCTT TTATCAAACA 18300 
AAAAACATAA AGTAATTCTA CATATCAACA AAAAAAAAAA AAAAAAAAAA AAAAAAAATT 18360 
TATTATAATA TAATAAAAAA TATAAAGACA TACGTTCACT TATTATTATA AATGATTTAT 18420 
T ACGATTAAA ACATATTGAG ATTATAATAA TATAATTTAA CATAGAAAGA GTTAAGAATA 18480 
CATTTTTTTT TTTATTTCGA TATGTAATTC AACATATATA TATATATATA TCTTTTTAAT 18540 

25 TTAATTAAAT AA AATTC CTT ATTATTCATA TTGTTTCTTT TATCACATGT GAAATATTAA 18600 
AAATAATTTT CGATTTTATC GATATATTTA TGTCGTTTAT ATACTTATAT AGGTCTTTAT 18660 
AACTATTGAT TAATAGAAGG TAATAGCCTA ATAATATAAA TACTCGTATT TATAAATTCA 18720 
TTTATATATT TCAAATATAT TTGCATGGTT TATTTTCAAA TACAATTAAT TAGATTTCTT 18780 
AAAT ATTTC T TCATTTATTC ATTTTTATAG CATATACATG CACATTATAA ATTATTAATA 18840 

30 AAAAATTTTT ATTTTAATAT ATAATAACAA TTTTCATACA TTACATTTTT CACACAACAT 18900 
TTAAGTTGTC ATAATGTAAC ACATTAAATA ATATATTACT TATATATATA TAATTATTAA 18960 
TTATATATTA AATAAAAATG TATTATCGCC TGTATTATCA TAGTATATAT AATGTTGTAT 19020 
AACGCTTCAA AATATATATA ATAATATAAT TAAAAATATA TATATAGTAA TTAATTATTT 19080 
TGTTATGTTA TGTAATAATG CAATTAATAT AAGATAAAAT TCAT 19124 

35 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3060 amino acids 
40 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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Asp 
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Glu 
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Ser 
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50 










55 








60 








Pro 


Gin 


Thr 


Pro 


Glu 


Asp 


Pro Cys 


Asp 


Leu 


Asp 


His 


Lys Tyr 


His 


Thr 
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Arg 
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Thr Asn Thr His Asn Leu Leu Val Asp Val Cys Met Ala Ala Lys Phe 
145 150 155 160 

Glu Gly Gin Ser He Thr Gin Asp Tyr Pro Lys Tyr Gin Ala Thr Tyr 
165 170 175 

b Gly Asp Ser Pro Ser Gin He Cys Thr Met Leu Ala Arg Ser Phe Ala 

180 185 190 

Asp He Gly Asp He Val Arg Gly Arg Asp Leu Tyr Leu Gly Asn Pro 

195 200 205 

Gin Glu He Lys Gin Arg Gin Gin Leu Glu Asn Asn Leu Lys Thr He 
10 210 215 220 

Phe Gly Lys He Tyr Glu Lys Leu Asn Gly Ala Glu Ala Arg Tyr Gly 
225 230 235 240 

Asn Asp Pro Glu Phe Phe Lys Leu Arg Glu Asp Trp Trp Thr Ala Asn 

245 250 255 

Arg Glu Thr Val Trp Lys Ala He Thr Cys Asn Ala Trp Gly Asn Thr 

260 265 270 

Tyr Phe His Ala Thr Cys Asn Arg Gly Glu Arg Thr Lys Gly Tyr Cys 

275 280 285 

Arg Cys Asn Asp Asp Gin Val Pro Thr Tyr Phe Asp Tyr Val Pro Gin 
20 2 9 0 2 9 5 3 0 0 

Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu Asp Phe Cys Arg Lys Lys 
305 310 315 320 

Asn Lys Lys He Lys Asp Val Lys Arg Asn Cys Arg Gly Lys Asp Lys 

325 330 335 

Glu Asp Lys Asp Arg Tyr Cys Ser Arg Asn Gly Tyr Asp Cys Glu Lys 

340 345 350 

Thr Lys Arg Ala He Gly Lys Leu Arg Tyr Gly Lys Gin Cys He Ser 

355 360 365 

Cys Leu Tyr Ala Cys Asn Pro Tyr Val Asp Trp He Asn Asn Gin Lys 
30 370 375 380 

Glu Gin Phe Asp Lys Gin Lys Lys Lys Tyr Asp Glu Glu He Lys Lys 
385 390 395 400 

Tyr Glu Asn Gly Ala Ser Gly Gly Ser Arg Gin Lys Arg Asp Ala Gly 

405 410 415 

Gly Thr Thr Thr Thr Asn Tyr Asp Gly Tyr Glu Lys Lys Phe Tyr Asp 

420 425 430 

Glu Leu Asn Lys Ser Glu Tyr Arg Thr Val Asp Lys Phe Leu Glu Lys 

435 t 440 445 

Leu Ser Asn Glu Glu He Cys Thr Lys Val Lys Asp Glu Glu Gly Gly 
^0 450 455 460 

Thr He Asp Phe Lys Asn Val Asn Ser Asp Ser Thr Ser Gly Ala Ser 
465 470 475 480 

Gly Thr Asn Val Glu Ser Gin Gly Thr Phe Tyr Arg Ser Lys Tyr Cys 
485 490 495 

45 Gin Pro Cys Pro Tyr Cys Gly Val Lys Lys Val Asn Asn Gly Gly Ser 

500 505 510 

Ser Asn Glu Trp Glu Glu Lys Asn Asn Gly Lys Cys Lys Ser Gly Lys 

515 520 525 

Leu Tyr Glu Pro Lys Pro Asp Lys Glu Gly Thr Thr He Thr He Leu 
50 5 3 0 5 3 5 5 4 0 

Lys Ser Gly Lys Gly His Asp Asp He Glu Glu Lys Leu Asn Lys Phe 
545 550 555 560 

Cys Asp Glu Lys Asn Gly Asp Thr He Asn Ser Gly Gly Ser Gly Thr 

565 570 575 

Gly Gly Ser Gly Gly Gly Asn Ser Gly Arg Gin Glu Leu Tyr Glu Glu 

580 585 590 

Trp Lys Cys Tyr Lys Gly Glu Asp Val Val Lys Val Gly His Asp Glu 

595 600 605 

Asp Asp Glu Glu Asp Tyr Glu Asn Val Lys Asn Ala Gly Gly Leu Cys 

biO bis 620 

He Leu Lys Asn Gin Lys Lys Asn Lys Glu Glu Gly Gly Asn Thr Ser 
625 630 635 640 

Glu Lys Glu Pro Asp Glu He Gin Lys Thr Phe Asn Pro Phe Phe Tyr 

645 650 655 

Tyr Trp Val Ala His Met Leu Lys Asp Ser He His Trp Lys Lys Lys 
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660 665 670 

Leu Gin Arg Cys Leu Gin Asn Gly Asn Arg lie Lys Cys Gly Asn Asn 

675 680 685 

Lys Cys Asn Asn Asp Cys Glu Cys Phe Lys Arg Trp He Thr Gin Lys 

690 695 700 

Lys Asp Glu Trp Gly Lys He Val Gin His Phe Lys Thr Gin Asn He 
70S 710 715 720 

Lys Gly Arg Gly Gly Ser Asp Asn Thr Ala Glu Leu He Pro Phe Asp 

725 730 735 

His Asp Tyr Val Leu Gin Tyr Asn Leu Gin Glu Glu Phe Leu Lys Gly 

740 745 750 

Asp Ser Glu Asp Ala Ser Glu Glu Lys Ser Glu Asn Ser Leu Asp Ala 

755 760 765 

Glu Glu Ala Glu Glu Leu Lys His Leu Arg Glu He He Glu Ser Glu 
15 770 775 780 

Asp Asn Asn Gin Glu Ala Ser Val Gly Gly Gly Val Thr Glu Gin Lys 
785 790 795 800 

m Asxi He Met Asp f$s Leu Leu Asn Tyr Glu Lys Asp Glu Ala Asp Leu 
" ^ 805 810 815 

20 Cys Leu Glu He His Glu Asp Glu Glu Glu Glu Lys Glu Lys Gly Asp 

820 825 830 

Gly Asn Glu Cys He Glu Glu Gly Glu Asn Phe Arg Tyr Asn Pro Cys 

835 840 845 

Ser Gly Glu Ser Gly Asn Lys Arg Tyr Pro Val Leu Ala Asn Lys Val 
25 850 855 860 

Ala Tyr Gin Met His His Lys Ala Lys Thr Gin Leu Ala Ser Arg Ala 
865 870 875 880 

Gly Arg Ser Ala Leu Arg Gly Asp He Ser Leu Ala Gin Phe Lys Asn 
885 890 895 

30 Gly Arg Asn Gly Ser Thr Leu Lys Gly Gin He Cys Lys He Asn Glu 

900 905 910 

Asn Tyr Ser Asn Asp Ser Arg Gly Asn Ser Gly Gly Pro Cys Thr Gly 

915 920 925 

Lys Asp Gly Asp His Gly Gly Val Arg Met Arg He Gly Thr Glu Trp 
35 930 935 940 

Ser Asn He Glu Gly Lys Lys Gin Thr Ser Tyr Lys Asn Val Phe Leu 
945 950 955 960 

Pro Pro Arg Arg Glu His Met Cys Thr Ser Asn Leu Glu Asn Leu Asp 

965 970 975 

Val Gly Ser Val Thr Lys Asn Asp Lys Ala Ser His Ser Leu Leu Gly 

980 985 990 

Asp Val Gin Leu Ala Ala Lys Thr Asp Ala Ala Glu He He Lys Arg 

995 1000 1005 

Tyr Lys Asp Gin Asn Asn He Gin Leu Thr Asp Pro He Gin Gin Lys 
45 1010 1015 1020 

Asp Gin Glu Ala Met Cys Arg Ala Val Arg Tyr Ser Phe Ala Asp Leu 
1025 1030 1035 1040 

Gly Asp He He Arg Gly Arg Asp Met Trp Asp Glu Asp Lys Ser Ser 

1045 1050 1055 

Thr Asp Met Glu Thr Arg Leu He Thr Val Phe Lys Asn He Lys Glu 

1060 1065 • 1070 

Lys His Asp Gly He Lys Asp Asn Pro Lys Tyr Thr Gly Asp Glu Ser 

1075 1080 1085 

Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp Trp Trp Glu Ala Asn 
55 1090 1095 1100 

Arg His Gin Val Trp Arg Ala Met Lys Cys Ala Thr Lys Gly He He 
1105 1110 1115 1120 

Cys Pro Gly Met Pro Val Asp Asp Tyr He Pro Gin Arg Leu Arg Trp 

1125 _ 1130 1135 

HcL xhx \j±u xxy Aid Glu iip ryx- Cys Lys Ala Gin Ser Gin uiu Tyr 

1140 1145 1150 

Asp Lys Leu Lys Lys He Cys Ala Asp Cys Met Ser Lys Gly Asp Gly 

1155 1160 1165 

Lys Cys Thr Gin Gly Asp Val Asp Cys Gly Lys Cys Lys Ala Ala Cys 
65 1170 1175 1180 
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Asp Lys Tyx Lys Glu Glu He Glu Lys Trp Asn Glu Gin Trp Arg Lys 
1185 1190 H95 1200 

He Ser Asp Lys Tyr Asn Leu Leu Tyr Leu Gin Ala Lys Thr Thr Ser 
1205 1210 1215 

5 Thr Asn Pro Gly Arg Thr Val Leu Gly Asp Asp Asp Pro Asp Tyr Gin 

1220 1225 1230 

Gin Met Val Asp Phe Leu Thr Pro He His Lys Ala Ser He Ala Ala 

1235 1240 1245 

Arg Val Leu Val Lys Arg Ala Ala Gly Ser Pro Thr Glu He Ala Ala 
10 . 1250 1255 1260 

Ala Ala Pro He Thr Pro Tyr Ser Thr Ala Ala Gly Tyr He His Gin 
1265 1270 1275 1280 

Glu He Gly Tyr Gly Gly Cys Gin Glu Gin Thr Gin Phe Cys Glu Lys 

1285 1290 1295 

Lys His Gly Ala Thr Ser Thr Ser Thr Thr Lys Glu Asn Lys Glu Tyr 

1300 1305 1310 

Thr Phe Lys Gin Pro Pro Pro Glu Tyr Ala Thr Ala Cys Asp Cys He 

1315 1320 1325 

Asn Arg Ser Gin Thr Glu Glu Pro Lys Lys Lys Glu Glu Asn Val Glu 
20 13 3 0 133 5 1 34 0 

Ser Ala Cys Lys He Val Glu Lys He Leu Glu Gly Lys Asn Gly Arg 
1345 1350 1355 1360 

Thr Thr Val Gly Glu Cys Asn Pro Lys Glu Ser Tyr Pro Asp Trp Asp 

1365 1370 1375 

Cys Lys Asn Asn He Asp He Ser His Asp Gly Ala Cys Met Pro Pro 

1380 1385 1390 

Arg Arg Gin Lys Leu Cys Leu Tyr Tyr He Ala His Glu Ser Gin Thr 

1395 1400 1405 

Glu Asn He Lys Thr Asp Asp Asn Leu Lys Asp Ala Phe He Lys Thr 
30 14 1 0 1415 14 2 0 

Ala Ala Ala Glu Thr Phe Leu Ser Trp Gin Tyr Tyr Lys Ser Lys Asn 
1425 1430 1435 1440 

Asp Ser Glu Ala Lys He Leu Asp Arg Gly Leu He Pro Ser Gin Phe 

1445 1450 1455 

Leu Arg Ser Met Met Tyr Thr Phe Gly Asp Tyr Arg Asp He Cys Leu 

1460 1465 1470 

Asn Thr Asp He Ser Lys Lys Gin Asn Asp Val Ala Lys Ala Lys Asp 

1475 1480 1485 

Lys He Gly Lys Phe Phe Ser Lys Asp Gly Ser Lys Ser Pro Ser Gly 
^0 1490 1495 1500 

Leu Ser Arg Gin Glu Trp Trp Lys Thr Asn Gly Pro Glu He Trp Lys 
1505 1510 1515 1520 

Gly Met Leu Cys Ala Leu Thr Lys Tyr Val Thr Asp Thr Asp Asn Lys 

1525 1530 1535 

Arg Lys He Lys Asn Asp Tyr Ser Tyr Asp Lys Val Asn Gin Ser Gin 

1540 1545 1550 

Asn Gly Asn Pro Ser Leu Glu Glu Phe Ala Ala Lys Pro Gin Phe Leu 

1555 1560 1565 

Arg Trp Met He Glu Trp Gly Glu Glu Phe Cys Ala Glu Arg Gin Lys 
50 1570 1575 1580 

Lys Glu Asn He He Lys Asp Ala Cys Asn Glu He Asn Ser Thr Gin 
1585 1590 1595 1600 

Gin Cys Asn Asp Ala Lys His Arg Cys Asn Gin Ala Cys Arg Ala Tyr 
1605 1610 1615 

5b Gin Glu Tyr Val Glu Asn Lys Lys Lys Glu Phe Ser Gly Gin Thr Asn 

1620 1625 1630 

Asn Phe Val Leu Lys Ala Asn Val Gin Pro Gin Asp Pro Glu Tyr Lys 

1635 1640 1645 

Gly Tyr Glu Tyr Lys Asp Gly Val Gin Pro He Gin Gly Asn Glu Tyr 

Gfl i/rcn irer - r-y- 

"""*" v «•» v AW^rw 

Leu Leu Gin Lys Cys Asp Asn Asn Lys Cys Ser Cys Met Asp Gly Asn 
1665 1670 1675 1680 

Val Leu Ser Val Ser Pro Lys Glu Lys Pro Phe Gly Lys Tyr Ala His 
1685 1690 1695 

65 Lys Tyr Pro Glu Lys Cys Asp Cys Tyr Gin Gly Lys His Val Pro Ser 
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1700 1705 1710 

lie Pro Pro Pro Pro Pro Pro Val Gin Pro Gin Pro Glu Ala Pro Thr 

1715 1720 1725 

Val Thr Val Asp Val Cys Ser lie Val Lys Thr Leu Phe Lys Asp Thr 

1730 1735 1740 

Asn Asn Phe Ser Asp Ala Cys Gly Leu Lys Tyr Gly Lys Thr Ala Pro 
1745 1750 1755 1760 

Ser Ser Trp Lys Cys lie Pro Ser Asp Thr Lys Ser Gly Ala Gly Ala 

1765 1770 1775 

Thr Thr Gly Lys Ser Gly Ser Asp Ser Gly Ser He Cys He Pro Pro 

1780 1785 1790 

Arg Arg Arg Arg Leu Tyr Val Gly Lys Leu Gin Glu Trp Ala Thr Ala 

1795 1800 1805 

Leu Pro Gin Gly Glu Gly Ala Ala Pro Ser His Ser Arg Ala Asp Asp 
15 1810 1815 1820 

Leu Arg Asn Ala Phe He Gin Ser Ala Ala He Glu Thr Phe Phe Leu 
1825 1830 1835 1840 

Trp Asp Arg Tyr Lys Glu Glu Lys Lys Pro Gin Gly Asp Gly Ser Gin 

1845 1850 1855 

Gin Ala Leu Ser Gin Leu Thr Ser Thr Tyr Ser Asp Asp Glu Glu Asp 

I860 1865 1870 

Pro Pro Asp Lys Leu Leu Gin Asn Gly Lys He Pro Pro Asp Phe Leu 

1875 1880 1885 

Arg Leu Met Phe Tyr Thr Leu Gly Asp Tyr Arg Asp He Leu Val His 
25 1890 1895 1900 

Gly Gly Asn Thr Ser Asp Ser Gly Asn Thr Asn Gly Ser Asn Asn Asn 
1905 1910 1915 1920 

Asn He Val Leu Glu Ala Ser Gly Asn Lys Glu Asp Met Gin Lys He 

1925 1930 1935 

Gin Glu Lys He Glu Gin He Leu Pro Lys Asn Gly Gly Thr Pro Leu 

1940 1945 1950 

Val Pro Lys Ser Ser Ala Gin Thr Pro Asp Lys Trp Trp Asn Glu His 

1955 I960 1965 

Ala Glu Ser He Trp Lys Gly Met He Cys Ala Leu Thr Tyr Thr Glu 
35 1970 1975 1980 

Lys Asn Pro Asp Thr Ser Ala Arg Gly Asp Glu Asn Lys He Glu Lys 
1985 1990 1995 2000 

Asp Asp Glu Val Tyr Glu Lys Phe Phe Gly Ser Thr Ala Asp Lys His 

2005 2010 2015 

Gly Thr Ala Ser Thr Pro Thr Gly Thr Tyr Lys Thr Gin Tyr Asp Tyr 

2020 2025 2030 

Glu Lys Val Lys Leu Glu Asp Thr Ser Gly Ala Lys Thr Pro Ser Ala 

2035 2040 2045 

Ser Ser Asp Thr Pro Leu Leu Ser Asp Phe Val Leu Arg Pro Pro Tyr 
^5 2050 2055 2060 

Phe Arg Tyr Leu Glu Glu Trp Gly Gin Asn Phe Cys Lys Lys Arg Lys 
2065 2070 2075 2080 

His Lys Leu Ala Gin He Lys His Glu Cys Lys Val Glu Glu Asn Gly 

2085 2090 2095 

Gly Gly Ser Arg Arg Gly Gly He Thr Arg Gin Tyr Ser Gly Asp Gly 

2100 2105 ' 2110 

Glu Ala Cys Asn Glu Met Leu Pro Lys Asn Asp Gly Thr Val Pro Asp 

2115 2120 2125 

Leu Glu Lys Pro Ser Cys Ala Lys Pro Cys Ser Ser Tyr Arg Lys Trp 
55 2130 2135 2140 

He Glu Ser Lys Gly Lys Glu Phe Glu Lys Gin Glu Lys Ala Tyr Glu 
2145 2150 2155 2160 

Gin Gin Lys Asp Lys Cys Val Asn Gly Ser Asn Lys His Asp Asn Gly 

2165 2170 2175 

Phe Cys Glu Thr Leu Thr Thr Ser Ser Lys Ala Lys Asp Phe Leu Lys 

2180 2185 2190 

Thr Leu Gly Pro Cys Lys Pro Asn Asn Val Glu Gly Lys Thr He Phe 

2195 2200 2205 

Asp Asp Asp Lys Thr Phe Lys His Thr Lys Asp Cys Asp Pro Cys Leu 
65 2 2 1 0 2 2 15 2 2 2 0 
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Lys Phe Ser Val Asn Cys Lys Lys Asp Glu Cys Asp Asn Ser Lys Gly 
2225 2230 2235 2240 

Thr Asp Cys Arg Asn Lys Asn Ser He Asp Ala Thr Asp He Glu Asn 
2245 2250 2255 

5 Gly Val Asp Ser Thr Val Leu Glu Met Arg Val Ser Ala Asp Ser Lys 

2260 2265 2270 

Ser Gly Phe Asn Gly Asp Gly Leu Glu Asn Ala Cys Arg Gly Ala Gly 

2275 2280 2285 

He Phe Glu Gly He Arg Lys Asp Glu Trp Lys Cys Arg Asn Val Cys 
10 2290 2295 2300 

Gly Tyr Val Val Cys Lys Pro Glu Asn Val Asn Gly Glu Ala Lys Gly 
2305 2310 2315 2320 

Lys His He He Gin He Arg Ala Leu Val Lys Arg Trp Val Glu Tyr 
2325 2330 2335 

15 Phe Phe Glu Asp Tyr Asn Lys He Lys His Lys He Ser His Arg He 

2340 2345 2350 

Lys Asn Gly Glu He Ser Pro Cys He Lys Asn Cys Val Glu Lys Trp 

2355 2360 2365 

Val Asp Gin Lys Arg Lys Glu Trp Lys Glu He Thr Glu Arg Phe Lys 
20 2370 2375 2380 

Asp Gin Tyr Lys Asn Asp Asn Ser Asp Asp Asp Asn Val Arg Ser Phe 
2385 2390 2395 2400 

Leu Glu Thr Leu He Pro Gin He Thr Asp Ala Asn Ala Lys Asn Lys 
„ 2405 2410 2415 

25 Val He Lys Leu Ser Lys Phe Gly Asn Ser Cys Gly Cys Ser Ala Ser 

2420 2425 2430 

Ala Asn Glu Gin Asn Lys Asn Gly Glu Tyr Lys Asp Ala He Asp Cys 

2435 2440 2445 

Met Leu Lys Lys Leu Lys Asp Lys He Gly Glu Cys Glu Lys Lys His 
30 2450 2455 2460 

His Gin Thr Ser Asp Thr Glu Cys Ser Asp Thr Pro Gin Pro Gin Thr 
2465 2470 2475 2480 

Leu Glu Asp Glu Thr Leu Asp Asp Asp He Glu Thr Glu Glu Ala Lys 
2485 2490 2495 

35 Lys Asn Met Met Pro Lys He Cys Glu Asn Val Leu Lys Thr Ala Gin 

2500 2505 . 2510 

Gin Glu Asp Glu Gly Gly Cys Val Pro Ala Glu Asn Ser Glu Glu Pro 

2515 2520 2525 

Ala Ala Thr Asp Ser Gly Lys Glu Thr Pro Glu Gin Thr Pro Val Leu 
40 2 5 3 0 2 5 3 5 2 5 4 0 

Lys Pro Glu Glu Glu Ala Val Pro Glu Pro Pro Pro Pro Pro Pro Gin 
2545 2550 2555 2560 

Glu Lys Ala Pro Ala Pro He Pro Gin Pro Gin Pro Pro Thr Pro Pro 
2565 2570 2575 

45 Thr Gin Leu Leu Asp Asn Pro His Val Leu Thr Ala Leu Val Thr Ser 

2580 2585 2590 

Thr Leu Ala Trp Ser Val Gly lie Gly Phe Ala Thr Phe Thr Tyr Phe 

2595 2600 2605 

Tyr Leu Lys Lys Lys Thr Lys Ser Ser Val Gly Asn Leu Phe Gin He 
50 2 6 1 0 2 6 15 2 6 2 0 

Leu Gin He Pro Lys Ser Asp Tyr Asp He Pro Thr Lys Leu Ser Pro 
2625 2630 2635 2640 

Asn Arg Tyr He Pro Tyr Thr Ser Gly Lys Tyr Arg Gly Lys Arg Tyr 
2645 2650 2655 

55 He Tyr Leu Glu Gly Asp Ser Gly Thr Asp Ser Gly Tyr Thr Asp His 

2660 2665 2670 

Tyr Ser Asp He Thr Ser Ser Glu Ser Glu Tyr Glu Glu Met Asp He 

2675 2680 2685 

Asn Asp He Tyr Val Pro Gly Ser Pro Lys Tyr Lys Thr Leu He Glu 
60 2690 2695 2700 

Val Val Leu Glu Pro Ser Gly Asn Asn Thr Thr Ala Ser Gly Asn Asn 
2705 2710 2715 2720 

Thr Thr Ala Ser Gly Asn Asn Thr Thr Ala Ser Gly Lys Asn Thr Pro 
2725 2730 2735 

65 Ser Asp Thr Gin Asn Asp He Gin Asn Asp Gly He Pro Ser Ser Lys 
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2740 2745 2750 

lie Thr Asp Asn Glu Tip Asn Gin Leu Lys Asp Glu Phe lie Ser Gin 

2755 2760 2765 

Tyx Leu Gin Ser Glu Pro Asn Thr Glu Pro Asn Met Leu Gly Tyr Asn 
5 2770 2775 2780 

Val Asp Asn Asn Thr His Pro Thr Thr Ser His His Asn Val Glu Glu 
2785 2790 2795 2800 

Lys Pro Phe lie Met Ser He His Asp Arg Asn Leu Phe Ser Gly Glu 
2805 2810 2815 

10 Glu Tyr Asn Tyr Asp Met Phe Asn Ser Gly Asn Asn Pro He Asn He 

2820 2825 2830 

Ser Asp Ser Thr Asn Ser Met Asp Ser Leu Thr Ser Asn Asn His Ser 

2835 2840 2845 

Pro Tyr Asn Asp Lys Asn Asp Leu Tyr Ser Gly He Asp Leu He Asn 
15 2850 2855 2860 

Asp Ala Leu Ser Gly Asn His He Asp He Tyr Asp Glu Met Leu Lys 
2865 2870 2875 2880 

Arg Lys Glu Asn Glu Leu Phe Gly Thr Lys His His Thr Lys His Thr 
2885 2890 2895 

20 Asn Thr Tyr Asn Val Ala Lys Pro Ala Arg Asp Asp Pro He Thr Asn 

2900 2905 2910 

Gin He Asn Leu Phe His Lys Trp Leu Asp Arg His Arg Asp Met Cys 

2915 2920 2925 

Glu Lys Trp Lys Asn Asn His Glu Arg Leu Pro Lys Leu Lys Glu Leu 
25 2930 2935 2940 

Trp Glu Asn Glu Thr His Ser Gly Asp He Asn Ser Gly He Pro Ser 
2945 2950 2955 2960 

Gly Asn His Val Leu Asn Thr Asp Val Ser He Gin He Asp Met Asp 
2965 2970 2975 

30 Asn Pro Lys Thr Lys Asn Glu He Thr Asn Met Asp Thr Asn Pro Asp 

2980 2985 2990 

Lys Ser Thr Met Asp Thr He Leu Asp Asp Leu Glu Lys Tyr Asn Glu 

2995 3000 3005 

Pro Tyr Tyr Tyr Asp Phe Tyr Glu Asp Asp He He Tyr His Asp Val 
35 3010 3015 3020 

Asp Val Glu Lys Ser Ser Met Asp Asp He Tyr Val Asp His Asn Asn 
3025 3030 3035 3040 

Val Thr Asn Asn. Asn Met Asp Val Pro Thr Lys Met His He Glu Met 
3045 3050 3055 

40 Asn He Val Asn 

3060 

(2) INFORMATION FOR SEQ ID NO: 15: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7295 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: CDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TCCAAGCTGT TTTTTTTTCT TTTTCTAGTT TTTCCATTGT ATATTCGTCA AATACGTACA 60 

CATATATATA TATATGTATA ACATGTGAGT ATTATTTTAT ACATCACATC GATTACATTT 120 

TAGCGTTTTT TTTCCCCAGA TCACATATAG TACGACTAAG AAACAAAATA ACATCATAAC 180 

60 AAACATAGTG ATTATCAATA CIATliATATTA CCACATAATA TAAAUTATTA AATAATATTA 24 U 

TTGCATGTTA GTGATAACTA CTATATCATA TACACCACTA CTAACTATCA CTACATAGTA 300 

ACAGTAGTAG TCACAATCAT AGCATCATGG TAATATAGAT TTTCATTTCA TATCTTCCTT 360 

ATTGTTTGTT ATACATACAC TATTAATATG TATTTATGTT ATAATGGTAG ACTATGTTAA 420 

CAATGTATGA ATGACCATCA TAAATTAATA ACAGACGCAT CAAAACAGTG TATATGTGTG 480 

65 CATTTATGAC ATAATGTAGT CGGGAAGCAT ACAAAAATGG AGCCAGGAGG TAGCGGTGGT 540 
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CGTGGTAGTG GCGGTAGTAG TAGTGGTAAA GGGAAGAAGG ATACATCTGA GTATATTTAT 600 
GTGAGCGATG CTAAGGATCT TTTGGATAGA GTTGGAGAAA AAGTGTACGA AGAAAAAGTG 660 
AAAAATGGTG ATGCTAAAAA ATATATTGAG GCGTTGAAAG GAAATTTGAA CACAGCAAAT 720 
GGTCGTAGTT CGGAAACAGC TAGCAGTATT GAAACGTGCA CCCTTGTAAA AGAATATTAT 780 
5 GAGCGTGTTA ATGGTGATGG TAAAAGGCAT CCGTGCAGAA AAGACGCAAA AAATGAAGAT 840 
GTAAACCGTT TTTCGGATAC ACTTGGTGGC CAATGTACAT ACAATAGGAT AAAAGATAGT 900 
CAACAGGGTG ATAATAAAGT AGGAGCCTGT GCTCCGTATA GACGATTACA TTTATGTGAT 960 
TATAATTTGG AATCTATAGA CACAACGTCG ACGACGCATA AGTTGTTGTT AGAGGTGTGT 1020 
ATGGCAGCAA AATACGAAGG AAACTCAATA AATACACATT ATACACAACA TCAACGAACT 1080 
10 AATGAGGATT CTGCTTCCCA ATTATGTACT GTATTAGCAC GAAGTTTTGC AGATATAGGT 1140 
GATATCGTAA GAGGAAAAGA TCTATATCTC GGTTATGATA ATAAAGAAAA AGAACAAAGA 1200 
AAAAAATTAG AACAGAAATT GAAAGATATT TTCAAGAAAA TACATAAGGA CGTGATGAAG 1260 
ACGAATGGCG CACAAGAACG CTACATAGAT GATGCCAAAG GAGGAGATTT TTTTCAATTA 1320 
AGAGAAGATT GGTGGACGTC GAATCGAGAA ACAGTATGGA AAGCATTAAT ATGTCATGCA 1380 
15 CCAAAAGAAG CTAATTATTT TATAAAAACA GCGTGTAATG TAGGAAAAGG AACTAATGGT 1440 
CAATGCCATT GCATTGGTGG AGATGTTCCC AGATATTTCG ATTATGTGCC GCAGTATCTT 1500 
CGCTGGTTCG AGGAATGGGC AGAAGACTTT TGCAGGAAAA AAAAAAAAAA ACTAGAAAAT 1560 
TTGCAAAAAC AGTGTCGTGA TTACGAACAA AATTTATATT GTAGTGGTAA TGGCTACGAT 1620 
TGCACAAAAA CTATATATAA AAAAGGTAAA CTTGTTATAG GTGAACATTG TACAAACTGT 1680 
20 TCTGTTTGGT GTCGTATGTA TGAAACTTGG ATAGATAACC AGAAAAAAGA ATTTCTAAAA 1740 
CAAAAAAGAA AATACGAAAC AGAAATATCA GGTGGTGGTA GTGGTAAGAG TCCTAAAAGG 1800 
ACAAAACGGG CTGCACGTAG TAGTAGTAGT AGTGATGATA ATGGGTATGA AAGTAAATTT 1860 
TATAAAAAAC TGAAAGAAGT TGGCTACCAA GATGTCGATA AATTTTTAAA AATATTAAAC 1920 
AAAGAAGGAA TATGTCAAAA ACAACCTCAA GTAGGAAATG AAAAAGCAGA TAATGTTGAT 1980 

25 TTTACTAATG AAAAATATGT AAAAACATTT TCTCGTACAG AAATTTGTGA ACCGTGCCCA 2040 
TGGTGTGGAT TGGAAAAAGG TGGTCCACCA TGGAAAGTTA AAGGTGACAA AACCTGCGGA 2100 
AGTGCAAAAA CAAAGACATA CGATCCTAAA AATATTACCG ATATACCAGT ACTCTACCCT 2160 
GATAAATCAC AGCAAAATAT ACTAAAAAAA TATAAAAATT TTTGTGAAAA AGGTGCACCT 2220 
GGTGGTGGTC AAATTAAAAA ATGGCAATGT TATTATGATG AACATAGGCC TAGTAGTAAA 2280 

30 AATAATAATA ATTGTGTAGA AGGAACATGG GACAAGTTTA CACAAGGTAA ACAAACCGTT 2340 
AAGTCCTATA ATGTTTTTTT TTGGGATTGG GTTCATGATA TGTTACACGA TTCTGTAGAG 2400 
TGGAAGACAG AACTTAGTAA GTGTATAAAT AATAACACTA ATGGCAACAC ATGTAGAAAC 2460 
AATAATAAAT GTAAAACAGA TTGTGGTTGT TTTCAAAAAT GGGTTGAAAA AAAACAACAA 2520 
GAATGGATGG CAATAAAAGA CCATTTTGGA AAGCAAACAG ATATTGTCCA ACAAAAAGGT 2580 

35 CTTATCGTAT TTAGTCCCTA TGGAGTTCTT GACCTTGTTT TGAAGGGCGG TAATCTGTTG 2640 
CAAAATATTA AAGATGTTCA TGGAGATACA GATGACATAA AACACATTAA GAAACTGTTG 2700 
GATGAGGAAG ACGCAGTAGC AGTTGTTCTT GGTGGCAAGG ACAATACCAC AATTGATAAA 2760 
TTACTACAAC ACGAAAAAGA ACAAGCAGAA CAATGCAAAC AAAAGCAGGA AGAATG CG AG 2820 
AAAAAAGCAC AACAAGAAAG TCGTGGTCGC TCCGCCGAAA CCCGCGAAGA CGAAAGGACA 2880 

40 CAACAACCTG CTGATAGTGC CGGCGAAGTC GAAGAAGAAG AAGACGACGA CGACTACGAC 2940 
GAAGACGACG AAGATGACGA CGTAGTCCAG GAGGAGGAAG AGGGAAAGGA GGAAGGAACG 3000 
GTCACAGAGG TAACAGAGGT AACAGAGGTC GTGGAAGAGA CGGTAACAGA ACAGGAAGGG 3060 
GTGAAGCCAT GTGACATAGT GGGCAAACTA TTTGAGGACG ACAAAAGTCT CAAAGAGGCA 3120 
TGTGGTCTAA AATACGGTCC AGGTGGAAAA GAAAAATTCC CCAATTGGAA GTGTGTCACA 3180 

45 CCAAGTGGTG TCAGTACTGC CACTAGTGGA AAAGACGGCG CTATATGTGT GCCACCCAGG 324 0 
AGACGACGAT TATACGTAGG TGGTTTATCA CAATGGGCAA GTCGTGGTGG TGACGAGACC 3300 
ACGGAGGTGT CGAGTGAAGC CACTTCGGCG CCGTCACAGT CAGAAAGTGA AAAACTACGT 3360 
ACTGCGTTTA TTGAGTCCGC TGCAATAGAG ACGTTTTTTT TGTGGCATAA GTATAAAGAA 3420 
GAGAAAAAAC CACCAGCAAC ACAAGATGGA GCGGGACTTG GAGTATCACT CCCAGAACCG 3480 

50 TCACCACCGG GAGAGGACCC CCAAACACAA TTACAACAAA CTGGTGTTAT ACCCCCCGAT 3540 
TTTTTGCGTC AAATGTTTTA TACATTAGCA GACTACAAAG ACATATTATA CAGTGGTAGT 3600 
AACGACACAA GTGACACAAC TGGTAAACAG ACACCTAGTA GTAGTAATGA CAACCTCAAA 3660 
AATATTGTTC TGGAAGCAAG TGGTAGTACT GAGCAGGAGA AGGAGAAAAT GAAACAAATA 3720 
CAAGCGAAAA TAAAAAAAAT TTTAAACGGT GCCACATCTG GTGTCCCACC TGTCACCAAA 3780 

55 AATAGTGTCA AAACCCCCCA ACAAACCTGG TGGGAAAACA TCGCGAAGGA TATCTGGAAT 3840 
GCTATGGTAT GTGCACTAAC ATATAAAGAA AATGACGCCA GAGGCACAAG TGCCAAAATA 3900 
GAACAGAATA AGGATTTGAA AAAGGCACTT TGGGACGAAG CCAACAAAAA CACCCCCATA 3960 
GAGAAATACC AATACACAAA TGTCAAACTC GAAGATGAAA GTGGTGCCAA AAGCAACGAC 4020 
ACCATCCAAC CCCCCACGTT AAAAAATTTT GTGGAAATAC CTACATTTTT TCGTTGGTTA 4080 

BD GATGAGTGGG GAAACAGTTT TTGTTTTGAG AG AG GAAAAG r? ^ rrwTV ^^ /, ^ ?v/ * >7v * *«-p-* * * * m a i * r* 
GAGTGTATGG ATGAGGATGG TGAAAAACAA TATAGTGGGG ATGGGGAATA TTGTGAAGAA 4200 
ATTTTTAGTA AGCAATATAA TGTTCTCCAG GATTTAAGTT CCAGTTGCGC TAAACCTTGT 4260 
AGATTGTATA AAACGTGGAT AGAAAAAAAA AAAACAGAAT ATGAGAAACA ACAAAAGGCA 4320 
TATGAACAAC AAAAAAGTAA TTACGAAAAT GAACAAAAAG ACAAATGCCA AACACAAAGT 4380 

65 AATAATAATG CTAATGAATT TTCTAGAACA CTAGGAGCGT CCCCTACAGC TGCAGAATTT 4440 
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TTACAAAAGT TAGGATCATC TAAAAATGAT AATGGATATG AGAATGGAGA GGATAATAAA 4500 
ATAGATTTTA AAAATCCAGA TAAAACATTT AAGGAAGCAC ACAGTTGTGA TCCATGTCCT 4560 
ATAACTGGAG TTAAATGTCA AAATGGTCAT TGTGTGGGTT CTGCTAATGG AAAGGAGTGC 4620 
AAAAACAATA AGATTACTGC AGAAGATATT AAAAATAAGA CAGATCCTAA TGGAAACATA 4680 
5 GAAATGGTTG TCAGTGATGA CAGTACAAAT ACATTTGAAC ATTTAGGCGA TTGTAAAAGC 4740 
TCAGGTATCT TTAAAGGTAT CAGAAAAGAT GAATGGAAAT GCGCTAATGT ATGTGGTGTA 4800 
GATATATGTA CTCTGGAAAA AAAAATTAAG AATGGGCAAG AAGGTGATAA AAAATATATC 4860 
ACAATGAAAG AATTGCTTAA ACGATGGCTA GAATATTTTT TAGAAGATTA TAATAGAATT 4920 
AGAAAAAAAA TAAAGCTATG TACGAAAAAG GAAGATGGAT GCAAATGTAT AAAAGGTTGT 4980 

10 ATAGAAAAAT GGGTACAAGA AAAAACGAAA GAATGGCAAA AAATAAACGA TACTTATCTT 5040 
GAACAATATA AAAATGATGA TGGTAATACT TTAACTAATT TTTTGGAGCA ATTCCAATAT 5100 
CGAACTGAAT TTAAAAACGC TATAAAACCT TGTGATGGTT TAGACCAGTT CAAGACTTCG 5160 
TGTGGTCTTA ATAGTACTGA TAATTCACAA AATGGTAATA ATAACGATCT TGTTCTATGT 5220 
TTGCTTAATA AACTTCAAAA AAAAATTAGT GAGTGTAAAG AACAACATAG TGGCCAAACC 5280 

15 CAAACACCGT GTGATAACTC TTCCCTTAGT GGTAAAGAAT CCACCCTCGT TGAAGACGTT 5340 
GATGATTATG AGGAACAAAA CCCAGAAAAC AAAGTGGAAC AACCTAAATT TTGTCCAGAT 5400 
ATGAAAGAAC CAAAAAAAGA AAACGATGAA GAAGTAGGCA CTTGTGGCGG AGACGAAGAA 5460 
AAAAAAAAAG TGGAAGACAG TGTAATCGAA CAAAAAGAGG AAGAAGCAGC TAGTGCCCCA 5520 
GAGGAATCTC CTCCATTAAC CCCGGAAGCA CCAAAAAAAG AGGAAAATGT GGTACCAAAA 5580 

20 CCACCACCAC CACCAAAAAA ACGCCGAATC AAAACCCGTA ATGTGTTGGA CCACCCCGCT 5640 
GTCATACCCG CCCTCATGTC TTCTACCATC ATGTGGAGTA TTGGCATCGG TTTTGCTGCG 5700 
TTCACTTATT TTTATCTAAA GAAAAAAACC AAATCATCTG TTGGAAATTT ATTCCAAATA 5760 
CTGCAAATAC CCAAAAGTGA TTATGATATA CCTACATTGA AATCAAGCAA TCGTTATATA 5820 
CCCTATGCAA GTGATAGACA TAAAGGCAAA ACATATATTT ATATGGAAGG AGATAGCAGT 5880 

25 GGAGATGAAA AATATGCATT TATGTCTGAT ACTACTGATA TAACTTCATC CGAAAGTGAG 5940 
TATGAAGAAT TGGATATTAA TGATATATAT GTACCAGGTA GTCCTAAATA TAAAACATTG 6000 
ATAGAAGTAG TACTTGAACC ATCAAAAAGA GATACACAAA ATGATATACA CAATGATATA 6060 
CCTAGTGATA TACCAAATAG TGACACACCA CCACCCATTA CTGATGATGA ATGGAATCAA 6120 
TTGAAAAAAG ATTTTATATC TAATATGTTA CAAAATACAC AAAATACGGA ACCAAATATT 6180 

30 TTACATGATA ATGTGGATAA TAATACCCAT CCTACCATGT CACGTCATAA TATGGACCAA 6240 
AAACCTTTTA TTATGTCCAT ACATGATAGA AATTTATTTA GTGGAGAAGA ATACAATTAT 6300 
GATATGTTTA ATAGTGGGAA TAATCCAATA AACATTAGTG ATTCAACAAA TAGTATGGAT 6360 
AGTCTAACAA GTAACAACCA TAGTCCATAT AATGATAAAA ATGATTTATA TAGTGGTATC 6420 
GACCTAATCA ACGACGCACT AAGTGGTAAT CATATTGATA TATATGATGA AATGCTCAAA 6480 

35 CGAAAAGAAA ATGAATTATT CGGGACGCAA CATCATCCAA AAAATATAAC GTCTAACCGT 6540 
GTCGTTACCC AAACAAGTAG TGACGACCCT ATAACCAATC AAATAAATTT GTTCCATAAA 6600 
TGGTTAGATA GGCATAGAGA TATGTGCGAA AAGTGGAAAA ATAATCACGA ACGGTTACCC 6660 
AAATTGAAAG AATTGTGGGA AAATGAGACA CATAGTGGTG ACATAAATAG TGGTATACCT 6720 
AGTGGTAACC ATGTGTTGAA TACTGATGTT TCTATTCAAA TAGATATGGA TAATCCGAAA 6780 

40 ACAATGAATG AATTTACTAA TATGGATACA AACCCCGACA AATCTACTAT GGATACTATA 684 0 
TTGGATGATC TAGAAAAATA TAACGAACCC TACTACTATG ATTTTTATAA ACATGATATC 6900 
TATTATGATG TAAATGATGA TAAAGCATCT GAGGATCATA TAAATATGGA TCATAATAAG 6960 
ATGGATAATA ATAATTCGGA TGTCCCCACT AACGTACAAA TTGAAATGAA TGTCATTAAT 7020 
AATCAGGAGT TACTACAAAA TGAATATCCT ATATCGCATA TGTAGGGAAT ATGAAAATAA 7080 

45 TAGATGTATA TATGTTTTTT TCTTTTTTTG TGTGTGTGCA GTTTATATTT TTTATTTGTA 7140 
GATG TTATAT ATTTTTTTTA TTTGTGGGTT ATATTATAAT TTTTATTTAT GGGTTATATA 7200 
TATATTTTTT TTTTTGTGCA TTTGTCTATT TTTTATTTGT GCTTTATATA TATATATATT 7260 
TTATTCAGCT TGGACTTAAC CAGGCTGAAC TTGCT 7295 

50 (2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2182 amino acids 

(B) TYPE : amino acid 

55 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

5u (iiii HyfuxHETICAL: NO 

(iv) ANTI -SENSE: NO 



65 



(v) FRAGMENT TYPE: N- terminal 



15 



25 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Glu Pro Gly Gly Ser Gly Gly Arg Gly Ser Gly Gly Ser Ser Ser 
1 5 10 15 

b Gly Lys Gly Lys Lys Asp Thr Ser Glu Tyr lie Tyr Val Ser Asp Ala 

20 25 30 

Lys Asp Leu Leu Asp Arg Val Gly Glu Lys Val Tyr Glu Glu Lys Val 

35 40 45 

Lys Asn Gly Asp Ala Lys Lys Tyr He Glu Ala Leu Lys Gly Asn Leu 
10 50 55 60 

Asn Thr Ala Asn Gly Arg Ser Ser Glu Thr Ala Ser Ser He Glu Thr 
65 70 75 80 

Cys Thr Leu Val Lys Glu Tyr Tyr Glu Arg Val Asn Gly Asp Gly Lys 

85 90 95 

Arg His Pro Cys Arg Lys Asp Ala Lys Asn Glu Asp Val Asn Arg Phe 

100 105 no 

Ser Asp Thr Leu Gly Gly Gin Cys Thr Tyr Asn Arg He Lys Asp Ser 

115 120 125 

Gin Gin Gly Asp Asn Lys Val Gly Ala Cys Ala Pro Tyr Arg Arg Leu 
20 130 135 140 

His Leu Cys Asp Tyr Asn Leu Glu Ser He Asp Thr Thr Ser Thr Thr 
145 150 155 160 

His Lys Leu Leu Leu Glu Val Cys Met Ala Ala Lys Tyr Glu Gly Asn 

165 170 175 

Ser He Asn Thr His Tyr Thr Gin His Gin Arg Thr Asn Glu Asp Ser 

180 185 190 

Ala Ser Gin Leu Cys Thr Val Leu Ala Arg Ser Phe Ala Asp He Gly 

195 200 205 

Asp He Val Arg Gly Lys Asp Leu Tyr Leu Gly Tyr Asp Asn Lys Glu 
30 210 215 220 

Lys Glu Gin Arg Lys Lys Leu Glu Gin Lys Leu Lys Asp He Phe Lys 
225 230 235 240 

Lys He His Lys Asp Val Met Lys Thr Asn Gly Ala Gin Glu Arg Tyr 

245 250 255 

He Asp Asp Ala Lys Gly Gly Asp Phe Phe Gin Leu Arg Glu Asp Trp 

260 265 270 

Trp Thr Ser Asn Arg Glu Thr Val Trp Lys Ala Leu He Cys His Ala 

275 280 285 

Pro Lys Glu Ala Asn Tyr Phe He Lys Thr Ala Cys Asn Val Gly Lys 
40 290 295 300 

Gly Thr Asn Gly Gin Cys His Cys He Gly Gly Asp Val Pro Thr Tyr 
305 310 315 320 

Phe Asp Tyr Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu 

325 330 335 

Asp Phe Cys Arg Lys Lys Lys Lys Lys Leu Glu Asn Leu Gin Lys Gin 

340 345 350 

Cys Arg Asp Tyr Glu Gin Asn Leu Tyr Cys Ser Gly Asn Gly Tyr Asp 

355 360 365 

Cys Thr Lys Thr He Tyr Lys Lys Gly Lys Leu Val He Gly Glu His 
50 370 375 380 

Cys Thr Asn Cys Ser Val Trp Cys Arg Met Tyr Glu Thr Trp He Asp 
385 390 395 400 

Asn Gin Lys Lys Glu Phe Leu Lys Gin Lys Arg Lys Tyr Glu Thr Glu 
405 410 415 

55 He Ser Gly Gly Gly Ser Gly Lys Ser Pro- Lys Arg Thr Lys Arg Ala 

420 425 430 

Ala Arg Ser Ser Ser Ser Ser Asp Asp Asn Gly Tyr Glu Ser Lys Phe 

435 440 445 

Tyr Lys Lys Leu Lys Glu Val Gly Tyr Gin Asp Val Asp Lys Phe Leu 

4a0 455 460 

Lys He Leu Asn Lys Glu Gly He Cys Gin Lys Gin Pro Gin Val Gly 
465 470 475 480 

Asn Glu Lys Ala Asp Asn Val Asp Phe Thr Asn Glu Lys Tyr Val Lys 
485 490 495 

65 Thr Phe Ser Arg Thr Glu He Cys Glu Pro Cys Pro Trp Cys Gly Leu 
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500 505 510 

Glu Lys Gly Gly Pro Pro Trp Lys Val Lys Gly Asp Lys Thr Cys Gly 

515 520 525 

Ser Ala Lys Thr Lys Thr Tyr Asp Pro Lys Asn lie Thr Asp lie Pro 
5 530 535 540 

Val Leu Tyr Pro Asp Lys Ser Gin Gin Asn lie Leu Lys Lys Tyr Lys 
545 550 555 560 

Asn Phe Cys Glu Lys Gly Ala Pro Gly Gly Gly Gin lie Lys Lys Trp 
565 570 575 

10 Gin Cys Tyr Tyr Asp Glu His Arg Pro Ser Ser Lys Asn Asn Asn Asn 

580 585 590 

Cys Val Glu Gly Thr Trp Asp Lys Phe Thr Gin Gly Lys Gin Thr Val 

595 600 605 

Lys Ser Tyr Asn Val Phe Phe Trp Asp Trp Val His Asp Met Leu His 
15 610 615 620 

Asp Ser Val Glu Trp Lys Thr Glu Leu Ser Lys Cys lie Asn Asn Asn 
625 630 635 640 

Thr Asn Gly Asn Thr Cys Arg Asn Asn Asn Lys Cys Lys Thr Asp Cys 
645 650 655 

20 Gly Cys Phe Gin Lys Trp Val Glu Lys Lys Gin Gin Glu Trp Met Ala 

660 665 670 

lie Lys Asp His Phe Gly Lys Gin Thr Asp lie Val Gin Gin Lys Gly 

675 680 685 

Leu lie Val Phe Ser Pro Tyr Gly Val Leu Asp Leu Val Leu Lys Gly 
25 690 695 700 

Gly Asn Leu Leu Gin Asn lie Lys Asp Val His Gly Asp Thr Asp Asp 
705 710 715 720 

lie Lys His lie Lys Lys Leu Leu Asp Glu Glu Asp Ala Val Ala Val 
725 730 735 

30 Val Leu Gly Gly Lys Asp Asn Thr Thr lie Asp Lys Leu Leu Gin His 

740 745 750 

Glu Lys Glu Gin Ala Glu Gin Cys Lys Gin Lys Gin Glu Glu Cys Glu 

755 760 765 

Lys Lys Ala Gin Gin Glu Ser Arg Gly Arg Ser Ala Glu Thr Arg Glu 
35 770 775 780 

Asp Glu Arg Thr Gin Gin Pro Ala Asp Ser Ala Gly Glu Val Glu Glu 
785 790 795 800 

Glu Glu Asp Asp Asp Asp Tyr Asp Glu Asp Asp Glu Asp Asp Asp Val 
805 810 815 

40 Val Gin Glu Glu Glu Glu Gly Lys Glu Glu Gly Thr Val Thr Glu Val 

820 825 830 

Thr Glu Val Thr Glu Val Val Glu Glu Thr Val Thr Glu Gin Glu Gly 

835 840 845 

Val Lys Pro Cys Asp lie Val Gly Lys Leu Phe Glu Asp Asp Lys Ser 
45 850 855 860 

Leu Lys Glu Ala Cys Gly Leu Lys Tyr Gly Pro Gly Gly Lys Glu Lys 
865 870 875 880 

Phe Pro Asn Trp Lys Cys Val Thr Pro Ser Gly Val Ser Thr Ala Thr 
885 890 895 

50 Ser Gly Lys Asp Gly Ala lie Cys Val Pro Pro Arg Arg Arg Arg Leu 

900 905 * 910 

Tyr Val Gly Gly Leu Ser Gin Trp Ala Ser Arg Gly Gly Asp Glu Thr 

915 920 925 

Thr Glu Val Ser Ser Glu Ala Thr Ser Ala Pro Ser Gin Ser Glu Ser 
55 9 3 0 . 93 5 9 4 0 

Glu Lys Leu Arg Thr Ala Phe He Glu Ser Ala Ala He Glu Thr Phe 
945 950 955 960 

Phe Leu Trp His Lys Tyr Lys Glu Glu Lys Lys Pro Pro Ala Thr Gin 
965 970 975 

60 Asp eiy Aia Gly Leu Giy vai ser Leu pro Glu pro ser Pro Pro Gly 

980 985 990 

Glu Asp Pro Gin Thr Gin Leu Gin Gin Thr Gly Val He Pro Pro Asp 

995 1000 1005 

Phe Leu Arg Gin Met Phe Tyr Thr Leu Ala Asp Tyr Lys Asp He Leu 
65 1010 1015 1020 
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Tyr Ser Gly Ser Asn Asp Thr Ser Asp Thr Thr Gly Lys Gin Thr Pro 
1025 1030 1035 1040 

Ser Ser Ser Asn Asp Asn Leu Lys Asn lie Val Leu Glu Ala Ser Gly 
1045 1050 1055 

5 Ser Thr Glu Gin Glu Lys Glu Lys Met Lys Gin lie Gin Ala Lys lie 

1060 1065 1070 

Lys Lys lie Leu Asn Gly Ala Thr Ser Gly Val Pro Pro Val Thr Lys 

1075 1080 1085 

Asn Ser Val Lys Thr Pro Gin Gin Thr Trp Trp Glu Asn lie Ala Lys 
10 1090 1095 1100 

Asp lie Trp Asn Ala Met Val Cys Ala Leu Thr Tyr Lys Glu Asn Asp 
1105 1110 . 1115 1120 

Ala Arg Gly Thr Ser Ala Lys lie Glu Gin Asn Lys Asp Leu Lys Lys 
1125 1130 H35 

15 Ala Leu Trp Asp Glu Ala Asn Lys Asn Thr Pro lie Glu Lys Tyr Gin 

1140 1145 1150 

Tyr Thr Asn Val Lys Leu Glu Asp Glu Ser Gly Ala Lys Ser Asn Asp 

1155 1160 1165 

Thr lie Gin Pro Pro Thr Leu Lys Asn Phe Val Glu lie Pro Thr Phe 
20 1170 1175 1180 

Phe Arg Trp Leu His Glu Trp Gly Asn Ser Phe Cys Phe Glu Arg Ala 
1185 1190 1195 1200 

Lys Arg Leu Ala Gin lie Lys His Glu Cys Met Asp Glu Asp Gly Glu 
1205 1210 1215 

25 Lys Gin Tyr Ser Gly Asp Gly Glu Tyr Cys Glu Glu He Phe Ser Lys 

1220 1225 1230 

Gin Tyr Asn Val Leu Gin Asp Leu Ser Ser Ser Cys Ala Lys Pro Cys 

1235 1240 1245 

Arg Leu Tyr Lys Thr Trp He Glu Lys Lys Lys Thr Glu Tyr Glu Lys 
30 1250 1255 1260 

Gin Gin Lys Ala Tyr Glu Gin Gin Lys Ser Asn Tyr Glu Asn Glu Gin 
1265 1270 1275 1280 

Lys Asp Lys Cys Gin Thr Gin Ser Asn Asn Asn Ala Asn Glu Phe Ser 
1285 1290 1295 

35 Arg Thr Leu Gly Ala Ser Pro Thr Ala Ala Glu Phe Leu Gin Lys Leu 

1300 1305 1310 

Gly Ser Cys Lys Asn Asp Asn Gly Tyr Glu Asn Gly Glu Asp Asn Lys 

1315 1320 1325 

He Asp Phe Lys Asn Pro Asp Lys Thr Phe Lys Glu Ala His Ser Cys 
40 1330 1335 1340 

Asp Pro Cys Pro He Thr Gly Val Lys Cys Gin Asn Gly His Cys Val 
1345 1350 1355 1360 

Gly Ser Ala Asn Gly Lys Glu Cys Lys Asn Asn Lys He Thr Ala Glu 
1365 1370 1375 

45 Asp He Lys Asn Lys Thr Asp Pro Asn Gly Asn He Glu Met Val Val 

1380 1385 1390 

Ser Asp Asp Ser Thr Asn Thr Phe Glu His Leu Gly Asp Cys Lys Ser 

1395 1400 1405 

Ser Gly He Phe Lys Gly He Arg Lys Asp Glu Trp Lys Cys Ala Asn 
50 1410 1415 1420 

Val Cys Gly Val Asp He Cys Thr Leu Glu Lys Lys He Lys Asn Gly 
1425 1430 1435 1440 

Gin Glu Gly Asp Lys Lys Tyr He Thr Met Lys Glu Leu Leu Lys Arg 
1445 1450 1455 

55 Trp Leu Glu Tyr Phe Leu Glu Asp Tyr Asn Arg He Arg Lys Lys He 

1460 1465 1470 

Lys Leu Cys Thr Lys Lys Glu Asp Gly Cys Lys Cys He Lys Gly Cys 

1475 1480 1485 

He Glu Lys Trp Val Gin Glu Lys Thr Lys Glu Trp Gin Lys He Asn 
30 1490 1495 1500 

Asp Thr Tyr Leu Glu Gin Tyr Lys Asn Asp Asp Gly Asn Thr Leu Thr 
1505 1510 1515 1520 

Asn Phe Leu Glu Gin Phe Gin Tyr Arg Thr Glu Phe Lys Asn Ala He 
1525 1530 1535 

65 Lys Pro Cys Asp Gly Leu Asp Gin Phe Lys Thr Ser Cys Gly Leu Asn 
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1540 1545 1550 

Ser Thr Asp Asn Ser Gin Asn Gly Asn Asn Asn Asp Leu Val Leu Cys 

1555 1560 1565 

Leu Leu Asn Lys Leu Gin Lys Lys lie Ser Glu Cys Lys Glu Gin His 

1570 1575 1580 

Ser Gly Gin Thr Gin Thr Pro Cys Asp Asn Ser Ser Leu Ser Gly Lys 
1585 1590 1595 1600 

Glu Ser Thr Leu Val Glu Asp Val Asp Asp Tyr Glu Glu Gin Asn Pro 

1605 1610 1615 

Glu Asn Lys Val Glu Gin Pro Lys Phe Cys Pro Asp Met Lys Glu Pro 

1620 1625 1630 

Lys Lys Glu Asn Asp Glu Glu Val Gly Thr Cys Gly Gly Asp Glu Glu 

1635 1640 1645 

Lys Lys Lys Val Glu Asp Ser Val He Glu Gin Lys Glu Glu Glu Ala 
15 1650 1655 1660 

Ala Ser Ala Pro Glu Glu Ser Pro Pro Leu Thr Pro Glu Ala Pro Lys 
1665 1670 1675 1680 

Lys Glu Glu Asn Val Val Pro Lys Pro Pro Pro Pro Pro Lys Lys Arg 
1685 1690 1695 

ZD Arg He Lys Thr Arg Asn Val Leu Asp His Pro Ala Val He Pro Ala 

1700 1705 1710 

Leu Met Ser Ser Thr He Met Trp Ser He Gly He Gly Phe Ala Ala 

1715 1720 1725 

Phe Thr Tyr Phe Tyr Leu Lys Lys Lys Thr Lys Ser Ser Val Gly Asn 
25 173 0 1 73 5 174 0 

Leu Phe Gin He Leu Gin He Pro Lys Ser Asp Tyr Asp He Pro Thr 
1745 1750 1755 1760 

Leu Lys Ser Ser Asn Arg Tyr He Pro Tyr Ala Ser Asp Arg His Lys 

1765 1770 1775 

Gly Lys Thr Tyr He Tyr Met Glu Gly Asp Ser Ser Gly Asp Glu Lys 

1780 1785 1790 

Tyr Ala Phe Met Ser Asp Thr Thr Asp He Thr Ser Ser Glu Ser Glu 

1795 1800 1805 

Tyr Glu Glu Leu Asp He Asn Asp He Tyr Val Pro Gly Ser Pro Lys 
35 1810 1815 1820 

Tyr Lys Thr Leu He Glu Val Val Leu Glu Pro Ser Lys Arg Asp Thr 
1825 1830 1835 1840 

Gin Asn Asp He His Asn Asp He Pro Ser Asp He Pro Asn Ser Asp 

1845 1850 1855 

Thr Pro Pro Pro He Thr Asp Asp Glu Trp Asn Gin Leu Lys Lys Asp 

I860 1865 1870 

Phe He Ser Asn Met Leu Gin Asn Thr Gin Asn Thr Glu Pro Asn He 

1875 1880 1885 

Leu His Asp Asn Val Asp Asn Asn Thr His Pro Thr Met Ser Arg His 
45 1890 1895 1900 

Asn Met Asp Gin Lys Pro Phe He Met Ser He His Asp Arg Asn Leu 
1905 1910 1915 1920 

Phe Ser Gly Glu Glu Tyr Asn Tyr Asp Met Phe Asn Ser Gly Asn Asn 

1925 1930 1935 

Pro lie Asn He Ser Asp Ser Thr Asn Ser Met Asp Ser Leu Thr Ser 

1940 1945 ' 1950 

Asn Asn His Ser Pro Tyr Asn Asp Lys Asn Asp Leu Tyr Ser Gly He 

1955 1960 1965 

Asp Leu He Asn Asp Ala Leu Ser Gly Asn His He Asp He Tyr Asp 
55 1970 • 1975 1980 

Glu Met Leu Lys Arg Lys Glu Asn Glu Leu Phe Gly Thr Gin His His 
1985 1990 1995 2000 

Pro Lys Asn He Thr Ser Asn Arg Val Val Thr Gin Thr Ser Ser Asp 
2005 2010 2015 

'«*» A£»p jriO He xixi Aan Gin lie h&u Leu Phe His Lys Trp Leu Asp Arg 

2020 2025 2030 

His Arg Asp Met Cys Glu Lys Trp Lys Asn Asn His Glu Arg Leu Pro 

2035 2040 2045 

Lys Leu Lys Glu Leu Trp Glu Asn Glu Thr His Ser Gly Asp He Asn 
65 2050 2055 2060 
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Ser Gly lie Pro Ser Gly Asn His Val Leu Asn Thr Asp Val Ser lie 
2065 2070 2075 2080 

Gin lie Asp Met Asp Asn Pro Lys Thr Met Asn Glu Phe Thr Asn Met 
2085 2090 2095 

5 Asp Thr Asn Pro Asp Lys Ser Thr Met Asp Thr lie Leu Asp Asp Leu 

2100 2105 2110 

Glu Lys Tyr Asn Glu Pro Tyr Tyr Tyr Asp Phe Tyr Lys His Asp He 

2115 2120 2125 

Tyr Tyr Asp Val Asn Asp Asp Lys Ala Ser Glu Asp His He Asn Met 
10 2130 2135 2140 

Asp His Asn Lys Met Asp Asn Asn Asn Ser Asp Val Pro Thr Asn Val 
2145 2150 2155 2160 

Gin He Glu Met Asn Val He Asn Asn Gin Glu Leu Leu Gin Asn Glu 
2165 2170 2175 

15 Tyr Pro He Ser His Met 

2180 



(2) INFORMATION FOR SEQ ID NO: 17: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

25 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE : 

30 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ATCGATCAGC TGGGAAGAAA TACTTCATCT 30 
(2) INFORMATION FOR SEQ ID NO: 18: 



35 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 

40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
45 (iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

ATCGATGGGC CCCGAAGTTT GTTCATTATT 30 
(2) INFORMATION FOR SEQ ID NO: 19: 

55 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
60 iu) TUFUbUtrX: linear 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 
65 (v) FRAGMENT TYPE: 
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(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

5 TCTCGTCAGC TGACGATCTC TAGTGCTATT 30 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE : cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ACGAGTGGGC CCTGTCACAA CTTCCTGAGT 30 
25 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

30 (C) STRANDEDNESS: single' 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
35 (iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
AGACCTCAAT TTCTAAG 17 
(2) INFORMATION FOR SEQ ID NO: 22: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

55 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
AATCGCGAGC ATCATCTG 18 
(2) INFORMATION FOR SEQ ID NO: 23: 



60 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
65 (B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
5 (iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

CCRAGRAGRC AARAAYTATG 20 
(2) INFORMATION FOR SEQ ID NO: 24: 

15 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
25 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

30 CCAWCKKARR AATTGWGG 18 

<2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 291 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE : 

45 



50 



55 



65 





(xi) J 


SEQUENCE 


DESCRIPTION 


: SEQ ID NO: 25: 






Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa 


Xaa 


Xaa 


Cys Xaa Xaa 


1 








5 










10 






15 


Xaa 


Xaa 


Xaa 


Val 


Cys 


He 


Pro 


Asp 


Arg 


Arg Tyr 


Gin 


Leu 


Cys Met Lys 








20 










25 








30 


Glu 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 






35 










40 








45 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 




50 










55 








60 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa ,Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


65 










70 








75 






80 


Xaa 


Asp 


Phe 


Cys 


Lys 


Asp 


He 


Arg 


Trp 


Ser Leu 


Gly Asp Phe Gly Asp 










85 










90 






95 


lie 


lie 


Heb 


Gly 


Thr 


Asp 


net 


Giu 


Gly 


He Giy 


ryr 


ser 


ijys Xaa Xaa 








100 










105 








110 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Thr Asp 


Glu 


Lys 


Ala Gin Gin 






115 










120 








125 




Arg 


Arg 


Lys 


Gin 


Trp 


Trp 


Asn 


Glu 


Ser 


Lys Ala 


Gin 


He 


Trp Thr Ala 




130 










135 








140 
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Met Met Tyr Ser Val Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
145 150 155 160 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Glu Pro Gin lie Tyr Arg Trp 

165 170 175 

lie Arg Glu Trp Gly Arg Asp Tyr Val Ser Glu Leu Pro Thr Glu Val 

180 185 190 

Gin Lys Leu Lys Glu Lys Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

195 200 205 

Xaa Xaa Cys Xaa Val Pro Pro Cys Gin Asn Ala Cys Lys Ser Tyr Asp 

210 215 220 

Gin Trp He Thr Arg Lys Lys Asn Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
225 230 235 240 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

245 250 255 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

260 265 270 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

275 280 285 

Cys Xaa Cys 
290 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


1 








5 








Xaa 


Xaa 


Xaa 


Xaa 
20 


Xaa 


Val 


Cys 


He 


He 


Val 


Asn 
35 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 
40 


Xaa 


Xaa 
50 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
55 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


65 










70 






Ser 


Phe 


Leu 


Asp 


Tyr 
85 


Gly 


His 


Leu 


Gly Gly 


Tyr 


Ser 


Thr 


Xaa 


Xaa 


Xaa 








100 










Xaa 


Xaa 


Xaa 
115 


Xaa 


Xaa 


Xaa 


Ser 


Glu 
120 


Glu 


Trp 
130 


Trp 


Asn 


Glu 


Phe 


Arg 
135 


Glu 


Glu 


His 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 






Leu 


Gin 


He 


Thr 


Gin 
165 


Trp 


lie 


Lys 


Glu 


Arg 


Asp 


Asn 
180 


Arg 


ser 


Lys 


Leu 


Xaa 


Xaa 


Xaa 
195 


Xaa 


Xaa 


Cys 


Xaa 


Glu 
200 


Lys 


Tyr 
210 


Arg 


Asp 


Trp 


He 


He 
215 


Arg 



Xaa 


Xaa 
10 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 
15 


Xaa 


Pro 


Asp 


Arg 


Arg 


He 


Gin 


Leu 


Cys 


25 










30 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
45 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
60 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Phe 


Cys 


Asn 


Asp 


Leu Lys Asn 






75 










80 


Ala 


Met 
90 


Gly 


Asn 


Asp 


Met 


Asp 
95 


Phe 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


105 










110 






His 


Lys 


He 


Lys 


Asn 


Phe 


Arg Lys 










125 








Lys 


Leu 


Trp 


Glu 
140 


Ala 


Met 


Leu 


Ser 


Cys 


Xaa 


Xaa 
155 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 
160 


Glu 


Trp 
170 


His 


Gly 


Glu 


Phe 


Leu 
175 


Leu 


Pro 


Lys 


Ser 


Lys 


Cys 


xaa 


xaa 


Xaa 


185 










190 






Lys 


Glu 


Cys 


He 


Asp 
205 


Pro 


Cys 


Met 


Ser 


Lys 


Phe 


Xaa 
220 


Xaa 


Xaa 


Xaa 


Xaa 
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Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
225 230 235 240 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

245 250 255 

Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Cys 
260 265 270 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 



Cys Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Cys Xaa 


1 


5 










10 








15 


Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Cys 


Val 


Pro 


Pro Arg Arg 


20 










25 










30 


Gin Glu Leu Cys 


Leu 


Gly 


Asn 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


35 








40 










45 




Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


50 






55 










60 






Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 


Val Cys Lys 


65 




70 










75 






80 


lie lie Asn Lys 


Thr 


Phe 


Ala 


Asp 


He 


Arg Asp 


He 


lie 


Gly Gly Thr 




85 










90 








95 


Asp Tyr Trp Asn 


Asp 


Leu 


Ser 


Asn Arg 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


100 










105 










110 


Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Asn 


Lys 


Lys 


Asn Asp Lys Leu Phe 


115 








120 










125 




Arg Asp Glu Trp 


Trp 


Lys 


Val 


He 


Lys 


Lys 


Asp 


Val 


Trp Asn Val He 


130 






135 










140 






Ser Trp Phe Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


145 




150 










155 






160 


lie Pro Gin Phe 


Phe 


Arg 


Trp 


Phe 


Ser 


Glu 


Trp 


Gly Asp Asp Tyr Cys 




165 










170 








175 


Gin Asp Lys Thr 


Lys 


Met 


He 


Glu 


Thr 


Leu 


Lys 


Val 


Glu 


Cys Xaa Xaa 


180 










185 










190 


Xaa Xaa Cys Xaa 


Asp 


Asp 


Asn 


Cys 


Lys 


Ser 


Lys 


Cys 


Asn 


Ser Tyr Lys 


195 








200 










205 


Glu Trp lie Ser 


Lys 


Lys 


Lys 


Lys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


210 






215 










2*20 






Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


225 




230 










235 






240 


Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa Xaa Xaa 




245 










250 






255 


Xaa Xaa Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa 


260 










265 










270 


Xaa Cys Xaa Xaa 


Cys 





















275 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 282 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
5 (iii) HYPOTHETICAL : NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa 

1 5 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Val Cys Gly Pro Pro Arg Arg 
15 20 25 30 

Gin Gin Leu Cys Leu Gly Tyr lie Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
50 55 60 

20 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Lys lie Cys Asn 
65 70 75 80 

Ala lie Leu Gly Ser Tyr Ala Asp lie Gly Asp lie Val Arg Gly Leu 

85 90 95 

Asp Val Trp Arg Asp lie Asn Thr Asn Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
25 1 00 1 05 110 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Lys Lys Gin Asn Asp Asn 

115 120 125 

Asn Glu Arg Asn Lys Trp Trp Glu Lys Gin Arg Asn Leu lie Trp Ser 
130 135 140 

30 Ser Met Val Lys His lie Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 
145 150 155 160 

Xaa Xaa Xaa Xaa lie Pro Gin Phe Leu Arg Trp Leu Lys Glu Trp Gly 

165 170 175 

Asp Glu Phe Cys Glu Glu Met Gly Thr Glu Val Lys Gin Leu Glu Lys* 
35 180 185 190 

lie Cys Xaa Xaa Xaa Xaa Cys Xaa Glu Lys Lys Cys Lys Asn Ala Cys 

195 200 205 

Ser Ser Tyr Glu Lys Trp lie Lys Glu Arg Lys Asn Xaa Xaa Xaa Xaa 
210 215 220 

40 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
225 230 235 240 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

245 250 255 

Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
45 260 265 270 

Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys 
275 280 



50 



(2) INFORMATION FOR SEQ ID NO: 29: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 324 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
55 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

60 tvj FRAGMENT TYFK: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



65 Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 
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1 








5 










10 




15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ala 


Cys 


He 


Pro 


Pro Arg Arg Gin Lys 








20 










25 






30 




Leu 


Cys Leu 


His 


Tyr 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 






35 










40 








45 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 




50 










55 










60 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 


65 










70 










75 




80 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Asp Phe Lys Arg Gin Met 


Phe 










85 










90 




95 




Tyr 


Thr 


Phe 


Ala 


Asp 


Tyr Arg Asp 


He 


Cys Leu Gly Thr Asp He Ser 








100 










105 






110 




Ser 


Lys 


Lys 


Asp 


Thr 


Ser 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 






115 




• 






120 








125 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


lie 


Ser 


Asn 


Ser 


He 


Arg Tyr Arg Lys 


Ser 




130 










135 










140 




Trp Trp Glu Thr 


Asn 


Gly Pro Val 


He 


Trp Glu Gly Met Leu Cys Ala 


145 










150 










155 




160 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 










165 










170 




175 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 








180 










185 






190 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Arg Pro Gin Phe 


Leu 






195 










200 








205 




Arg 


Trp 


Leu 


Thr 


Glu 


Trp Gly Glu Asn 


Phe 


Cys 


Lys Glu Gin Lys 


Lys 




210 










215 










220 


Glu 


Tyr 


Lys 


Val 


Leu 


Leu 


Ala 


Lys 


Cys 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 


225 










230 










235 




240 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Cys 


Val 


Ala 


Cys Lys Asp Gin Cys 










245 










250 




255 




Lys 


Gin 


Tyr 


His 


Ser 


Trp 


He Gly He 


Trp 


He 


Asp Xaa Xaa Xaa 


Xaa 








260 










265 






270 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 






275 










280 








285 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Xaa 




290 










295 










300 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa Xaa Xaa Xaa 


Cys 


305 










310 










315 




320 


Xaa 


Xaa 


Xaa 


Cys 





















10 



15 



20 



25 



30 



35 



40 



(2) INFORMATION FOR SEQ ID NO: 30: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 362 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 
55 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Ala Cys Ala Pro Tyr Arg Arg Leu His Leu Cys Asp Tyr Asn Leu Xaa 

tSU 1 b 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

65 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin Leu Cys Thr Val Leu 
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10 



15 



20 



25 



30 



35 



40 





50 










55 










60 










Ala 


Arg Ser 


Phe Ala Asp 


He 


Gly Asp 


He 


Val 


Arg 


Gly Lys 


Asp Leu 


65 










70 










75 










80 


Tyr 


Leu Gly Tyr Asp Asn 


Lys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










85 










90 










95 




Xaa 


Xaa 


Xaa 


Xaa 
100 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
105 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
110 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys Gly 


Gly Asp 






115 










120 










125 








Phe 


Phe Gin Leu Arg Glu 


Asp 


Trp Trp 


Thr 


Ser 


Asn 


Arg Glu 


Thr 


Val 




130 










135 










140 










Trp 


Lys 


Ala 


Leu 


He 


Cys 


His 


Ala 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 










155 










160 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 
165 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
170 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
175 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Pro 


Gin 


Tyr 


Leu 








180 










185 










190 




Arg 


Trp 


Phe 


Glu 


Glu 


Trp 


Ala 


Glu Asp 


Phe 


Cys Arg 


Lys 


Lys 


Lys 


Lys 






195 










200 










205 








Lys 


Leu 


Glu 


Asn 


Leu 


Gin 


Lys 


Gin 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 




210 










215 










220 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


225 










230 










235 










240 


Thr 


Asn 


Cys 


Ser 


Val 


Trp 


Cys 


Arg Met 


Tyr Glu Thr 


Trp 


He 


Asp Asn 










245 










250 










255 




Gin 


Lys 


Lys 


Xaa 
260 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
265 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
270 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
275 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
280 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
285 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
290 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
295 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
300 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


305 










310 










315 










320 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
325 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 
330 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
335 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
340 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
345 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
350 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Cys 


















355 










360 

















(2) INFORMATION FOR SEQ ID NO: 31: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 411 amino acids 
45 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
50 (iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 





Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




1 








5 










10 










15 






Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


uu 


















25 










30 








Ala 


Cys 


Ala 


Pro 


Tyr 


Arg Arg Leu His Val 


Cys Asp 


Gin 


Asn 


Leu 


Xaa 








35 










40 










45 










Xaa 


Xaa 
50 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
55 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
60 


Xaa 


Xaa 


Xaa 


Xaa 


65 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 



10 
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65 70 75 80 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin He Cys Thr 

85 90 95 

Met lieu Ala Arg Ser Phe Ala Asp He Gly Asp He Val Arg Gly Arg 

100 105 no 

Asp Leu Tyr Leu Gly Asn Pro Gin Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

115 120 125 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

130 135 140 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asn Asp Pro Glu Phe Phe Lys Leu Arg 
145 150 155 160 

Glu Asp Trp Trp Thr Ala Asn Arg Glu Thr Val Trp Lys Ala He Thr 

165 170 - 175 

Cys Asn Ala Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa 
15 180 185 190 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

195 200 205 

Xaa Xaa Xaa Xaa Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala 
210 215 220 

20 Glu Asp Phe Cys Arg Lys Lys Asn Lys Lys He Lys Asp Val Lys Arg 
225 230 235 240 

Asn Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa 

245 250 255 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
25 2 6 0 2 6 5 2 7 0 

Xaa Xaa Xaa Xaa Xaa Cys He Ser Cys Leu Tyr Ala Cys Asn Pro Tyr 

275 280 285 

Val Asp Trp He Asn Asn Gin Lys Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
290 295 300 

30 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
305 310 315 320 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

325 330 335 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
35 340 345 350 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa 

355 360 365 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
370 375 380 

40 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
385 390 395 400 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys 
405 410 

45 (2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 411 amino acids 

(B) TYPE: amino acid 

50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 
55 (iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 



fin 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

15 10 15 

Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
20 25 30 

65 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 



WO 96/40766 



-83- 



PCI7US96/09508 



10 



20 



35 40 45 

Xaa Xaa Val Phe Leu Pro Pro Arg Arg Glu His Met Cys Thr Ser Asn 

50 55 60 

Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
65 70 75 80 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

85 90 95 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

100 105 HO 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Ala Met Cys Arg Ala Val Arg Tyr 

115 120 125 

Ser Phe Ala Asp Leu Gly Asp lie He Arg Gly Arg Asp Met Trp Asp 

130 135 140 

Glu Asp Lys Ser Ser Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
15 145 150 155 160 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

165 170 175 

Xaa Xaa Xaa Xaa Xaa Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp 

180 185 190 

Trp Trp Glu Ala Asn Arg His Gin Val Trp Arg Ala Met Lys Cys Ala 

195 200 205 

Thr Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa He Pro 

210 215 220 

Gin Arg Leu Arg Trp Met Thr Glu Trp Ala Glu Trp Tyr Cys Lys Ala 
25 225 230 235 240 

Gin Ser Gin Glu Tyr Asp Lys Leu Lys Lys He Cys Xaa Xaa Xaa Xaa 

245 250 255 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Cys Gly 

260 265 270 

Lys Cys Lys Ala Ala Cys Asp Lys Tyr Lys Glu Glu He Glu Lys Trp 

275 280 285 

Asn Glu Gin Trp Arg Lys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

290 295 300 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
35 305 310 315 320 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

325 330 335 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

340 345 350 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys 

355 360 365 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

370 375 380 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
45 385 390 395 400 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys 
405 410 



30 



40 



50 



(2) INFORMATION FOR SEQ ID NO: 33: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 311 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
55 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

ou (v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



65 Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 
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1 5 10 is 

Xaa Xaa Xaa Xaa Xaa Xaa Ala Cys Met Pro Pro Arg Arg Gin Lys Leu 

20 25 30 

Cys Leu Tyr Tyr He Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

50 55 60 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
65 70 75 80 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin Phe Leu Arg Ser Met Met 

85 90 95 

Tyr Thr Phe Gly Asp Tyr Arg Asp He Cys Leu Asn Thr Asp He Ser 

100 105 no 

Lys Lys Gin Asn Asp Val Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
15 115 120 125 

Xaa Xaa Xaa Xaa Xaa Ser Lys Ser Pro Ser Gly Leu Ser Arg Gin Glu 

130 135 140 

Trp Trp Lys Thr Asn Gly Pro Glu He Trp Lys Gly Met Leu Cys Ala 
145 150 155 160 

20 Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

165 170 175 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

180 185 190 

Xaa Xaa Xaa Xaa Xaa Xaa Lys Pro Gin Phe Leu Arg Trp Met He Glu 
25 195 2 0 0 2 05 

Trp Gly Glu Glu Phe Cys Ala Glu Arg Gin Lys Lys Glu Asn He He 

210 215, 220 

Lys Asp Ala Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa 
225 230 235 240 

Lys His Arg Cys Asn Gin Ala Cys Arg Ala Tyr Gin Glu Tyr Val Glu 

245 250 255 

Asn Lys Lys Lys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

260 265 270 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
35 275 280 285 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys 

290 295 300 

Xaa Xaa Xaa Xaa Cys Xaa Cys 
305 310 



30 



40 



(2) INFORMATION FOR SEQ ID NO: 34: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 

45 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
50 (iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Pro Arg Arg Gin Xaa Leu Cys 
1 5 

50 [2) IN FORMAT I ON FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
65 (C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CCRAGRAGRC AARAAYTATG 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
CCSMGSMGSC AGCAGYTSTG 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Phe Ala Asp Xaa Xaa Asp lie 
1 5 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
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TTTGCWGATW WWSGWGATAT 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
TTCGCSGATW WCSGSGACAT 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Pro Gin Phe Xaa Arg Trp 
1 5 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
CCAWCKKARR AATTGWGG 

(2) INFORMATION FOR SEQ ID NO: 42: 

ii) StsQuENCE CHAkACTi£Kl&>TlC^': 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 
(Vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
CCASCKGWAG AWCTGSGG 18 
(2) INFORMATION FOR SEQ ID NO:43: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 

15 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
20 (iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Glu Trp Gly Xaa Xaa Xaa Cys 
1 5 

30 (2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
40 (iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CAAWAWTCWT CWCCCCATTC 20 
(2) INFORMATION FOR SEQ ID NO: 45: 

50 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



55 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

\ vl } ORIGINAL SOURCE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



CAGWASTCST CSCCCCACTC 

65 



20 
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WE CLAIM : 

1. A composition comprising a nucleotide sequence of the DBL gene family, wherein said nucleotide 
sequence is selected from the group consisting of the var-1, var-2, var-3 and var-7 genes. 

2. The composition of Claim 1, wherein the nucleotide sequence of the var-1, var-2, var-3 or var-7 
gene encodes a cysteine-rich domain homologous to a cystem*rich domain of a Duffy Antigen Binding Protein (DABP) 
derived from Plasmodium max and a Sialic Acid Binding Protein (SABP) derived from Plasmodium falciparum. 

3. The composition of Claim 1, wherein the nucleotide sequence of the var f, var-2, var-3 or var-7 
gene encodes a cysteine-rich interdomain region between a first domain and a second domain. 

4. The composition of Claim 1, wherein the nucleotide sequence is derived from a coding region of 
SEQ ID N0:13 or SEQ ID N0:15. 

5. A composition comprising a polypeptide encoded by a nucleotide sequence of the DBL gene family, 
wherein said polypeptide is encoded by a var-1, var-2, var-3 or iw-7gene. 

6. The composition of claim 5, wherein the polypeptide comprises a sequence of amino acid residues 
homologous to cysteine-rich domains of a Duffy Antigen Binding Protein (DABP) derived from Plasmodium max and 
a Sialic Acid Binding Protein (SABP) derived from Plasmodium falciparum. 

7. The composition of claim 5, wherein the polypeptide comprises a sequence of about 300 to 400 
amino acid residues occuring in the cysteine-rich interdomain region between a first domain and a second domain of 
a polypeptide encoded by the var-1, var-2, var-3 or var-7 gene. 

8. The composition of claim 5, wherein the polypeptide comprises a sequence of amino acid residues 
of SEQ ID N0:14 or SEQ ID N0:16. 

9. The composition of claim 5, wherein the polypeptide comprises a sequence of about 50 to about 
325 amino acid residues of SEQ ID N0:14 or SEQ ID NO: 16. 

10. The composition of claim 5, wherein the polypeptide comprises a sequence of about 75 to about 
300 amino acid residues of SEQ ID N0:14 or SEQ ID N0:16. 

11. The composition of claim 5, wherein the polypeptide comprises a sequence of about 100 to about 
250 amino acid residues of SEQ ID N0:14 or SEQ ID NO: 16. 

12. The composition of claim 5, further comprising a pharmaceutical^ acceptable carrier and an 
isolated Duffy Antigen Binding Protein (DABP) binding domain polypeptide, a Sialic Acid Binding Protein (SABP) 
binding domain polypeptide, or a combination thereof, in an amount sufficient to induce a protective immune response 
to Plasmodium merozoites in a mammal 

13. The composition of any of the preceding claims for use in inducing a protective immune response 
to Plasmodium merozoites in a mammal. 

14. Use of the composition of any one of claims 1-12 in the preparation of a medicament for inducing 
a protective immune response to Plasmodium merozoites in a mammal. 

15. A method of inducing a protective immune response to Plasmodium merozoites in a mammal, 
comprising administering to a mammal an immunologically effective amount of a pharmaceutical composition 
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comprising a pharmaceutical^ acceptable carrier and an isolated cysteine-rich polypeptide encoded by a var gene 
selected from the group of genes consisting of var-1, m-2 t var-3 and var J genes. 

16. The method of claim 15, further comprising administering to said mammal an immunologically 
effective amount of a Duffy Antigen Binding Protein (DABP) binding domain polypeptide, a Sialic Acid Binding Protein 
(SABP) binding domain polypeptide, or a combination thereof. 
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FIG. 3 

S^S U ^them dd SeqUenCeS and thB s y rrthetic oligonucleotide primers 
UNIEBP5 and 5A: P R R Q K/E L C 

UNIEBP5, for A+T biased codon usage: 
CC(A/G)-AG(G/A)-AG(G/A)-CAA-(G/A)AA-(C/T)TA-TG 

UNIEBP5A, for G+C biased codon usage- 
CC(QGHC/A)G(C/GHC/A)G(C^WG^CAG-(Cn)T(C/G)-TG 

UNIEBP5 B and C: F A D I/Y G/R D I 

UNIEBP5B, for A+T biased codon usage: 
TTT^C(A/T)-GAT^An7(A/T)(A/THC^)G(A/T)-GAT-AT 

UNIEBP5C, for G+C biased codon usage: 
TTC-GC(G/C)-GAT-{AnO(AADC-{G^)G(G/C)-GAC-AT 

UNIEBP3 and 3A: P Q. F L7F R W 

UNIEBP3, for A+T biased codon usage: 
CCA^A/T)C(T/G)«CT/G)A(A/G)-{A/G)AA-TTG-(An , )GG 

UNIEBP3A, for G+C biased codon usage: 
CCA-{C/G)C(G/T)-G(AniA-GA(An>CTG-{C/G)GG 

UNIEBP3 B and C: E W G D/E D/E Y/F C 

UNIEBP3B, for A+T biased codon usage- 
CA^(AO)MAn)TC^AfT)TC^An)CC>CCA.TTC 

UNIEBP3C, for G+C biased codon usage- 
CA-G(AAT)A-{G/C)TC-{G/C)TC-(G/C)CC-CCA-CTC G+C Biased 
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